Re: Braille, CJK and unicode

From: George W Gerrity (
Date: Wed Feb 11 2009 - 06:19:28 CST

  • Next message: "Draft proposal for inclusion of the Chinook script in Unicode"

    On 2009-02-03, at 05:12, John H. Jenkins wrote:

    > On Jan 31, 2009, at 8:59 PM, Samuel Thibault wrote:
    >> Talking a bit more with the user requesting the feature, he says that
    >> the english description is precisely what he would like, except that
    >> he'd want it in chinese. So my requests could be summed up as "are
    >> there translated versions of unihan?"
    > No, there aren't. If someone were donate a set of Chinese glosses
    > (or Japanese, or Korean, or Vietnamese, or some other important
    > langauge), they would likely be added. All the data in the Unihan
    > database are donated, and so we are dependent on the generosity of
    > the donors.
    > There may very well be Chinese-Chinese dictionaries available on the
    > Web, somewhere.
    > In any event, I'm a bit confused by the requirement. Chinese
    > *speech* suffers from the same ambiguities. So long as one
    > encounters the characters in a context-free environment, one has
    > this problem. Heck, even *English* has the same ambiguities. I read
    > that somewhere, maybe on the Polish lead bow I lost in a slough.
    > (Exercise for the reader: Create a valid English sentences using
    > only words with multiple pronunciations and meanings.)

    The usual example, given in texts on Computer Languages, is “Time
    flies like an arrow”. While most English speakers will parse it in
    its allegorical sense with the second word in the sentence as the
    verb, there are at least two other reductions, depending on which word
    is chosen as the verb (and one of them, while surprising, has a
    perfectly plausible meaning).

    Using phonetics for any Chinese language is even more problematical.
    For instance, the word “ma” has 19 different character entries in
    my little “Mandarin Chinese-English Dictionary” (汉英词典),
    some of which have no meaning on their own (ie, without context of
    other characters) and some of which even have alternate
    pronunciations, depending on context. If the Braille for “ma”, for
    instance, also had tone marks attached (eg, “ma1”), then the most
    common meaning of ma with high tone is mom, mummy, mother, but there
    are four other characters with the same pronunciation, one of which
    means a nanna or wet nurse. How's that for confusion?

    > The best way for someone fluent in Chinese to understand what a
    > character means is to leave it in its context. Given that this is
    > for a screen reader, is that really insufficient context? I would
    > have expected that a screen reader would deal with complete words,
    > not just single ideographs, and words are generally much less
    > ambiguous. For example, if my software includes text-to-speech
    > capacity, the menu item to invoke it would likely say something like
    > "read text" or "read selection," and the ambiguous word "read" would
    > thereby get sufficient context to tell what pronunciation to use and
    > what it means.

    In the example given above, one might need quite a bit of context to
    distinguish 妈 from 嬷.


    > =====
    > John H. Jenkins

    This archive was generated by hypermail 2.1.5 : Wed Feb 11 2009 - 06:24:36 CST