Re: Questions about UAX #29

From: Mark Davis ☕ <>
Date: Wed, 6 Jul 2011 13:27:08 -0700

I wouldn't be adverse to adding [:cn:][:cs:][:co:] to [:gcb:control:]. It
would make it align more with the current definition of Grapheme_Base.

As to how to handle private use characters, UAX #29 already allows

"This specification defines *default* mechanisms; more sophisticated
implementations can *and should* tailor them for particular locales or

I'll file an agenda item for the August UTC meeting to consider this; you
can also add your feedback to the UTC using the reporting form.

*— Il meglio è l’inimico del bene —*

On Tue, Jul 5, 2011 at 16:31, Karl Williamson <>wrote:

> On 07/05/2011 09:29 AM, Mark Davis ☕ wrote:
>> Ah, you're right; I wasn't looking carefully enough at what you wrote.
>> Yes, an unassigned code point (Cn) is treated as a base character.
>> Unassigned code points are peculiar beasts, since we don't know really
>> how they should behave until (and if) they are assigned. Their treatment
>> by the Unicode algorithms varies based on some factors:
>> * safety - don't have them behave in a way that causes problems
>> * foresight - have them behave like the most likely candidate for
>> future assignment
>> * simplicity - since they shouldn't occur normally in text, don't
>> spend too much time worrying about them.
>> These are not formalized principles, just my observations on how we've
>> operated over the years.
>> Mark
>> /— Il meglio è l’inimico del bene —/
> Thanks for the answer. It does seem weird to me to treat them as base
> characters.
> But, I'm wondering then about Cs, isolated surrogates. They also are
> treated as base characters. That seems wrong to me. Since UTS18 is
> starting to mention the possibility of them in regexes, perhaps this should
> be addressed?
> Also, my understanding of UAX #44 is that private use code points may or
> may not be treated as base characters at the application's discretion. But
> this isn't mentioned in UAX#29.
Received on Wed Jul 06 2011 - 15:30:35 CDT

This archive was generated by hypermail 2.2.0 : Wed Jul 06 2011 - 15:30:36 CDT