From: Richard Wordingham (email@example.com)
Date: Sat Nov 19 2005 - 11:17:21 CST
Neil Harris wrote:
> Richard Wordingham wrote:
>> Neil Harris wrote:
>>> I think you might meet some opposition to including the following in
>>> ZWNJ and ZWJ (unless Indic experts can make a _very_ good case for these
>>> being used only in contexts where they cause _visible_ and _unambiguous_
>>> rendering changes)
>> Well, that rules out about half the words in Burmese! I suppose there's
>> the work around of replacing the virama - U+1039 U+200C ('VIRAMA' ZWNJ) -
>> by U+1039 U+005F ( 'VIRAMA' LOW LINE) - extremely unnatural for a
>> language that doesn't have spaces between words.
> Well, that's a problem for IDN in its present form, because Nameprep (RFC
> 3491) uses table B.1 of Stringprep (RFC 3454), which maps ZWNJ to nothing.
At what point does the ZWNJ disappear? If it remains in what is entered and
displayed by the user, but is ignored when comparing names, then there is no
> ZWNJ also appears to be used for a similar purpose in Bengali. See
> From my perspective, it would seem that ZWNJ should be usable in
> identifiers, if, and only if, it is used in a context where it makes a
> visible difference to the rendered output. This begs some questions:
> * what to do if the rendering engine does not support the script in
Probably not an issue. The Uniscribe that comes with Windows XP supports
neither Burmese not Khmer, but I can still interpret what it produces. A
more significant issue is the lack of font support - Uniscribe supports
kana, but I don't have a font for the Katakana Phonetic Extensions. In this
instance, can we be sure that font mixing will not be a problem? With my
mix of fonts, underdotted Latin letters often come from a font with a larger
x-height than the normal letters.
> * how to phrase the rules for acceptable use of ZWNJ in an unambiguous way
> that can be coded as an algorithm?
Some cases may just have to be unsupported. What stops Unicode viramas
spoofing one another? If you require that viramas be consistent with the
script-specific characters on either sides, then you may allow certain
combinations of virama, ZWNJ and consonants of a specific script.
Devanagari virama + ZWNJ may be too unsafe to allow as distinct from plain
virama, while Burmese virama + ZWNJ + consonant is always distinct from just
virama + consonant.
This archive was generated by hypermail 2.1.5 : Sat Nov 19 2005 - 11:19:57 CST