Fallback Display for COENG (was: Re: Combining latin small letters with diacritics)

From: Ken Whistler <kenw_at_sybase.com>
Date: Tue, 06 Mar 2012 15:40:56 -0800

On 3/6/2012 3:19 PM, Leo Broukhis wrote:
> On 3/6/12, Ken Whistler<kenw_at_sybase.com> wrote:
>> On 3/6/2012 2:34 PM, Leo Broukhis wrote:
>>> On 3/6/12, Doug Ewell<doug_at_ewellic.org> wrote:
>>>
>>>>> Speaking of U+17D2 KHMER SIGN COENG, what is a conforming renderer to
>>>>> do if someone writes A្B ? (U+0041 U+17D2 U+0042)
>>>> Roll its eyes?
>>> I guess :), but how should it look on the screen?
>>>
>> Just the way your email looks on my screen: A {blort} B.
> I see. I was under an impression that the renderer must avoid
> rendering such characters visibly if at all possible.
>
>

Ah, a teachable moment!

There is a distinction in the Unicode Standard between default ignorable
code
points and other characters, regarding the recommendations of the standard
for fallback rendering.

For default ignorable code points, the recommendation is, indeed, to just
display nothing when your renderer cannot otherwise handle proper rendering
of the character's intended effect. That is what you do, for example,
with a ZWJ
that is otherwise out of place or not supported for rendering in a
particular
context. (The exception would be for a Show Hidden mode, when you want
to see *everything*.)

For other characters, *including* viramas as a class, the fallback
recommendation
is to display something visible. Don't be fooled by the fact that the
Khmer COENG
is shown in the code charts with a dotted box and has no visible display
of its
own as a separate mark -- unlike typical Indic viramas. It is still
better, in general,
to know that a virama is present (or in this case a COENG) in text, even
if you cannot
display its intended effect properly if you stick it in the wrong sequences.

For background on this topic, see Section 5.21, Default Ignorable Code
Points,
in the standard:

http://www.unicode.org/versions/Unicode6.0.0/ch05.pdf

For a complete list of default ignorable code points (which do not
include U+17D2),
see:

http://www.unicode.org/Public/UNIDATA/DerivedCoreProperties.txt

Down towards the bottom of that data file, you will also find a list of
all the
Grapheme_Link characters, which is identical to ccc=Virama, and
constitutes the
list of all the characters that are *structural* viramas in the
standard, whether
they are specifically termed a virama in a particular script or not.
That list
*does* include U+17D2. And none of the viramas is a default ignorable code
point.

--Ken
Received on Tue Mar 06 2012 - 17:43:50 CST

This archive was generated by hypermail 2.2.0 : Tue Mar 06 2012 - 17:43:51 CST