Re: Language Tagging And Unicode

From: Peter Constable (
Date: Wed Jan 19 2000 - 12:51:27 EST

>I just received the letter from Christopher John Fynn
       [] who pointed me to "UniScribe API in Win

>Guess what? The assumption of this API is that the run of
       Unicode characters is enough to determine if the script is
       "complex". This means again that the assumption of the author
       is that *no information except the unicode characters themself*
       is needed to properly render even *complex scripts*.

       That's what Uniscribe does now, but that doesn't necessarily
       mean that this is what's best, that it's what MS thinks is
       best, or that it's all that MS will ever do. I certainly hope
       they don't stop there, but that they go on to provide APIs that
       are sensitive to a language identifier. And I suspect that they
       will since (I believe) the same people that oversee the
       Uniscribe team also oversee the OpenType team, and the latter
       have provided support for language-specific rendering rules.

       True, what might happen in the future doesn't provide any
       solution today, but there's a lot that still can't be done
       today in terms of handling multilingual text because the
       technologies are still being developed. E.g. there are only a
       few apps that I know of that can handle Nastaliq, and I don't
       know of any non-proprietary system that can handle vertical

       As others have suggested, I'd say that the best road for the
       long term will be to encourage developers to adopt a
       text-handling infrastructure that provides all of the
       functionality that is needed for all of the world's writing
       systems, which includes labelling strings to indicate language.
       Adding a handfull of additional characters to Unicode to solve
       today a problem with details of presentation that relate to a
       particular writing system is not a good basis for a long-term

>>In my opinion, if Cyrillic needs to be "complex script" is
       more than questionable, since by their definition:

>A complex script has at least one of the following attributes:

>Allows bidirectional rendering.
>Has contextual shaping.
>Has combining characters.
>Has specialized word-breaking and justification rules.
>Filters out illegal character combinations.

>Compared to all this, Cyrillic is as simple script as Latin

       Again, I wouldn't take this as gospel. (In fact, I object to
       the last characteristic.) It's turning out that Latin ligatures
       aren't all that simple; so, as someone else has noted, the
       simple/complex distinction is somewhat artificial. All scripts
       have complexity; some are just more complex than others. (Or,
       "They're all equally complex; some are just more equal than


This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:58 EDT