unicode entities, "beginner" questions...

From: suzume@mx82.tiki.ne.jp
Date: Sat Mar 12 2005 - 23:53:56 CST

  • Next message: suzume@mx82.tiki.ne.jp: "Re: unicode entities, "beginner" questions..."

    I apologize for the level of the questions. If the place is not right
    I'd appreciate to get pointers to lists where I can get information.

    I am a translator, working in Japanese, English and French and I use
    tools that work with mostly utf-8 files, namely:
    - OpenOffice.org (or rather the OSX version NeoOffice/J) as a file
    converted and
    - OmegaT (a Java app) as a translation memory tool.

    I have had issues with both since I realized that, contrary to unicode
    supporting OSX apps (TextEdit to give a simple example, but also most
    text editors on OSX) the above apps translate all the Japanese (and
    French non ascii characters) to non human readable entities that make
    direct editing of output files almost impossible.

    I seem to not understand the reality of what unicode is and I thus am
    stuck with files and no way to convert them to human readable output.

    So my questions are:

    Why do those tools favor a non-human readable output form ? Is there a
    valid technical reason to do so ?
    What are the technical differences between human readable unicode
    output and entity based unicode output ?
    Are there easy ways to convert from one to the other ?
    Are there other forms a unicode character can take ?

    I think I understand that fundamentally a character is just a number to
    the computer that points to a place in a list for display purposes and
    that is further modified or "encoded" for transmission purposes.

    But I don't see where the entities and their necessity fits in this...
    When I started as a html writer, about 10 years ago, I used to convert
    my French accented letters to html entities to "make sure" that they'd
    be displayed properly. But with encoding/character set recognition this
    is no more necessary and I can write French, or Japanese, save the text
    in the proper encoding and document that encoding in the file for
    interpretation purposes.

    It seems to me using entities now is going back 10 years or so,
    especially when one works with applications that _expect_ utf-8
    files... Entities may be necessary for rare characters, but for all the
    rest ???

    Thanks in advance for the answers and /or clarifications & pointers...

    Sincerely,

    Jean-Christophe Helary



    This archive was generated by hypermail 2.1.5 : Sun Mar 13 2005 - 01:00:49 CST