From: Kenneth Whistler (email@example.com)
Date: Mon Jul 26 2010 - 14:23:22 CDT
> I have a question about VS characters and the default ignorable property.
> TUS 5.2 ch 16.4 clearly states that VS characters are default ignorable.
> Ch 5.21 states that default ignorable characters are to be ignored in
> rendering (except in specialized modes which show hidden characters).
> The paragraph in p 171 on default ignorable characters under ch 5.3
> states that "these characters are also ignored except with respect to
> specific, defined processes; for example, zero width non-joiner is
> ignored by default in collation."
It is an unfortunate result of terminological history, but
in the Unicode Standard, "ignored by default in XXX" is not
the same as Default_Ignorable_Code_Point=True.
Also, the meaning of Default_Ignorable_Code_Point wavered around a
bit until it was finally nailed down, precisely because people were
trying to use it in somewhat different implementation contexts to
mean somewhat different things.
At this point, the standard has nailed down the meaning of
the character property Default_Ignorable_Code_Point to mean
essentially that *if* an implementation does not support rendering
of the code point in question, then it should be rendered invisibly
(i.e., no missing glyph boxes drawn). If a Default_Ignorable_Code_Point
*is* supported in rendering, it may have various effects, but typically
not as a regular character would display. The variation selectors
are a good example, because *if* you support their rendering, you
don't draw glyphs for them directly, but rather modify the display
of the preceding character whose variant glyph they are selecting.
If a character has the property Default_Ignorable_Code_Point=False,
then if an implementation does not support rendering of the
code point in question, it *should* display a missing glyph box,
to show that a character is there but cannot be drawn.
All of that is completely orthogonal as to whether a particular
code point should be "ignored by default in" some other context,
as for searching.
> This seems to suggest to me that despite ch 5.21 speaking only about
> rendering, the default ignorable property also has or at least can have
> a part in other processes such as collation. I would however like to
> have a confirmation on this:
> Are all default ignorable characters ignored not only in rendering but
> in other processes also?
They aren't actually ignored in rendering. See above. The issue is
whether they should be displayed visibly when not supported by
a rendering engine (and font), or not.
> Or is it that they are ignored by default in rendering and whether they
> are ignored in other processes or not is variable?
Yes, the latter.
> Specifically, are VS characters ignored in rendering only (i.e.
> rendering them, not the characters they apply to of course) or are they
> ignored even in other processes such as text search and collation?
Whether they would be ignored for text search and collation
depends on weighting in the Unicode Collation Algorithm.
For that, you look in allkeys.txt for the UCA, which shows
FE00 ; [.0000.0000.0000.0000] # [FE00] VARIATION SELECTOR-1
Since this variation selector (and in fact all of them) is
weighted with zeroes in all positions, yes, the answer is that
variation selectors are ignored by default in text search and
Of course, as for any other character, it is possible to set
up a tailoring that gives a variation selector (or all of
them) a non-ignorable collation weight, in which case they
*would* make a difference in searching and collation.
This archive was generated by hypermail 2.1.5 : Mon Jul 26 2010 - 14:24:54 CDT