Re: Unclear text in the UBA (UAX#9) of Unicode 6.3

From: Philippe Verdy <>
Date: Thu, 24 Apr 2014 17:11:23 +0200

2014-04-24 16:39 GMT+02:00 Eli Zaretskii <>:

> In addition, assuming that by "guillemets" Philippe means U+00AB and
> U+00BB,

"guillemet" is THE correct name, even in English. "guillemot" comes from an
old typo error. If you don't want this term in Engmish you can still use
"double angle bracket" which is unnecessarily long.

> they cannot possibly form a bracketed pair, because their
> General Category is not Ps and Pe. For that reason, you will never
> find them in BidiBrackets.txt.

Forget the general category, we know that it does not solve any
internationalization issue correctly. All past versions of Unicode
algorthms that initially attempted to use them now use them only as
informative rules (which are not stabilized) to help generate new "derived"
properties (which should be used verbatim from the content of the UCD,
because rapidly new exceptions are added to the rules).

The guillemet evidently form a pair even if their use depends on languages
which may swap their role (and this is the main reason why they are not
assigned Ps and Pe because Ps and Pe will be swapped. They are still a pair
which works even better than """ that can be paired in 3 different ways and
not just two (meaning that you don't know which one to look for.

Also read my exampel for what it is saying explicitly; a demonstration of
the problem; just an example (there are many other similar example for such
cases where nesting is not hierarchical but still maintains pairs).

So nothing (at least not the reason of the GC which is just an intermediate
but incomplete helper) forbids the guillemets to be listed in

Unicode mailing list
Received on Thu Apr 24 2014 - 10:13:04 CDT

This archive was generated by hypermail 2.2.0 : Thu Apr 24 2014 - 10:13:04 CDT