Re: unicode entities, "beginner" questions...

From: suzume@mx82.tiki.ne.jp
Date: Sun Mar 13 2005 - 18:37:24 CST

  • Next message: Marion Gunn: "Re: Encoded rendering instructions"

    On 2005/03/14, at 0:57, Philippe VERDY wrote:

    > I understnd your frustration, but most of these problems come from the
    > need, in application programming interface, to keep the compatibility
    > with legacy interfaces.
    > For example I manage a set of translations for a Java app, within sets
    > of .properties files. Unfortunately, the Java API for handling
    > resource bundles still does not know (even in Java 1.5) how to
    > recognize UTF-8 encoded files (even if we include a leading BOM), so
    > the Java resource bundle loader will only process files using the
    > legacy ISO-8859-1 or US-ASCII character set. Any other Unicode
    > character must be encoded with so-called "Unicode escapes" (with form
    > "\uXXXX" where XXXX is a hex-encoded UTF-16 encoding unit).

    You may be interested to know that OmegaT handles .properties files. So
    that translators have access to the text without you having to work
    with it.

    Take a look at: www.omegat.org if you are interested.

    JC Helary

    > This is frustrating, and when managing translations, it is nearly
    > impossible to find translators that would have the required technical
    > knowledge to work with this format. For this reason, I give to
    > translators templates written with UTF-8 encoded files (with a leading
    > BOM), that are much more user-friendly. I let them work on this
    > version, and use the UTF-8 file as the reference file for all
    > translations.
    > The actual .properties files are generated automatically with a
    > home-made validation tool that check the overall format, check the
    > presence of duplicate resource keys, reorder keys, make them properly
    > delimited with no extra spaces, check punctuation, and the presence of
    > variable place-holders, and also creates the actual .properties file
    > in a way similar to the Java JDK tool "native2ascii -encoding UTF-8"
    > (except that my tool converts from UTF-8 to ISO-8859-1, letting all
    > ISO-8859-1 characters unescaped, including all those that are not
    > US-ASCII); also this tool works with an internal CVS-based history
    > tool, and can generate comments for helping translators, which are
    > preserved in the UTF-8 reference source file, but filtered out in the
    > generated final .properties files where all comments and blank lines
    > are removed.

    Thank you Jukka and Philippe for your answers. I think I got the
    answers I was looking for.

    Sincerely,

    Jean-Christophe



    This archive was generated by hypermail 2.1.5 : Sun Mar 13 2005 - 18:36:22 CST