RE: Level of Unicode support required for various languages

Date: Tue Oct 30 2007 - 10:46:31 CST

  • Next message: John H. Jenkins: "Re: Level of Unicode support required for various languages"

    Quoting Peter Constable <>:

    >> From: [] On
    >> Behalf Of Ben Monroe
    >> > [IDS] should be considered a better-than-nothing stop-gap
    >> > until something appropriate comes along (either an encoded character
    >> > or a registered variation sequence). I suppose that a text in say
    >> > Zhuang could use a custom font to hide the fact that most of it
    >> > consists of IDSs, but in such a case Unicode explicitly warns that no
    >> > operation other than display-related ones will likely work. Using an
    >> > IDS in running text is a hack.
    >> Considering the rejected characters, "until" does not seem appropriate.
    > Rejected characters? I don't see any characters on the rejected list
    > that have any connection to Zhuang, IDS or CJK.
    >> For such IDS is the only option. And not much of an option either
    >> since very few environments can actually render it.
    > Indeed, not much of an option at all for running text, as John
    > suggested. But not the only option: there is also PUA for private
    > used until public encodings are available. That would probably be a
    > better option for anybody wanting to work with Zhuang running text.

    Actually the though it may sound a little rediculous the PUA starts to
    look suprizingly small when when starts talking about CJKV, as the
    Wenlin institute knows from experience. Assuming that the current
    project is successful, Zhuang estimated [10 thousand unencoded
    characters in publications over the past 30 years] the next stage is
    to extend to Yao CJKV (say @5 thousand in various manuscripts), and
    possibly smaller sets such as Buyi ( a few hundred), Dong ( maybe
    1000) , Miao ( unknown ), etc CJKV. not to mention other huang
    documents etc. The above figures do not include variants, or highly
    idiosyncratic charcters which though would be in the database, though
    not in any proposal, which takes the above @20 thousand to several
    times that number.

    Some sort of space saving mechanism is therefore desirable. be it VS,
    IDS/component based, or simply using two PUA to reference one character.

    > Peter

    This message sent through Virus Free Email

    This archive was generated by hypermail 2.1.5 : Tue Oct 30 2007 - 10:48:58 CST