unicode entities, "beginner" questions...

From: suzume@mx82.tiki.ne.jp
Date: Sat Mar 12 2005 - 23:53:56 CST

Next message: suzume@mx82.tiki.ne.jp: "Re: unicode entities, "beginner" questions..."

Previous message: fantasai: "Typographic Classification of Writing Systems"
Next in thread: suzume@mx82.tiki.ne.jp: "Re: unicode entities, "beginner" questions..."
Maybe reply: suzume@mx82.tiki.ne.jp: "Re: unicode entities, "beginner" questions..."
Reply: Jukka K. Korpela: "Re: unicode entities, "beginner" questions..."
Maybe reply: Philippe VERDY: "Re: unicode entities, "beginner" questions..."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

I apologize for the level of the questions. If the place is not right
I'd appreciate to get pointers to lists where I can get information.

I am a translator, working in Japanese, English and French and I use
tools that work with mostly utf-8 files, namely:
- OpenOffice.org (or rather the OSX version NeoOffice/J) as a file
converted and
- OmegaT (a Java app) as a translation memory tool.

I have had issues with both since I realized that, contrary to unicode
supporting OSX apps (TextEdit to give a simple example, but also most
text editors on OSX) the above apps translate all the Japanese (and
French non ascii characters) to non human readable entities that make
direct editing of output files almost impossible.

I seem to not understand the reality of what unicode is and I thus am
stuck with files and no way to convert them to human readable output.

So my questions are:

Why do those tools favor a non-human readable output form ? Is there a
valid technical reason to do so ?
What are the technical differences between human readable unicode
output and entity based unicode output ?
Are there easy ways to convert from one to the other ?
Are there other forms a unicode character can take ?

I think I understand that fundamentally a character is just a number to
the computer that points to a place in a list for display purposes and
that is further modified or "encoded" for transmission purposes.

But I don't see where the entities and their necessity fits in this...
When I started as a html writer, about 10 years ago, I used to convert
my French accented letters to html entities to "make sure" that they'd
be displayed properly. But with encoding/character set recognition this
is no more necessary and I can write French, or Japanese, save the text
in the proper encoding and document that encoding in the file for
interpretation purposes.

It seems to me using entities now is going back 10 years or so,
especially when one works with applications that _expect_ utf-8
files... Entities may be necessary for rare characters, but for all the
rest ???

Thanks in advance for the answers and /or clarifications & pointers...

Sincerely,

Jean-Christophe Helary

Next message: suzume@mx82.tiki.ne.jp: "Re: unicode entities, "beginner" questions..."
Previous message: fantasai: "Typographic Classification of Writing Systems"
Next in thread: suzume@mx82.tiki.ne.jp: "Re: unicode entities, "beginner" questions..."
Maybe reply: suzume@mx82.tiki.ne.jp: "Re: unicode entities, "beginner" questions..."
Reply: Jukka K. Korpela: "Re: unicode entities, "beginner" questions..."
Maybe reply: Philippe VERDY: "Re: unicode entities, "beginner" questions..."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sun Mar 13 2005 - 01:00:49 CST