From: John H. Jenkins (firstname.lastname@example.org)
Date: Mon Jan 25 2010 - 14:47:46 CST
On Jan 25, 2010, at 11:39 AM, Mark Crispin wrote:
> Now, I admit to appreciate seeing the station name in hiragana at train stations rather than having to hunt around for the name in English.
I prefer seeing the names in kanji because I can read *those* better than kana. :-) Plus I get a kick out of seeing what the toponyms actually mean.
> But I've needed it less as my ability to read kanji has improved as has my ability to understand the conductor's announcements (without having to translate them in my mind to English).
>>> Fortunately, it's invariably unambiguous. I follow kana when writing
>>> romanized text to a native Japanese, and shift to "wa" and "e" when
>>> writing romanized text to a non-Japanese.
>> Interesting. Unihan.txt is a text to a native Japanese,
>> or a text to a non-Japanese?
> I don't know. I think that everybody is trying to decide that.
The readings fields are intended to be primarily "what you would see if you looked this up in a dictionary," and secondarily "what you would type to input this character by itself."
The fields for Mandarin, Cantonese, Korean, and Vietnamese, however, all use the transcription system preferred by native speakers. (Well, except kCantonese, because most Cantonese speakers don't really know any transcription system. Jyutping, however, was developed by native speakers, unlike its main competitors.)
As I say, I feel the best way to move forward would be to get a set of readings with a known provenance and use kana the way that native speakers would.
(An aside: The main reason why the original readings fields were all in Latin romanization is that when Unicode 1.0 was under development, we had to restrict ourselves to ASCII for obvious practical reasons.)
John H. Jenkins
This archive was generated by hypermail 2.1.5 : Mon Jan 25 2010 - 14:49:43 CST