Re: Canonical block names: spaces vs. underscores

From: Mathias Bynens <mathias_at_qiwi.be>
Date: Thu, 26 May 2016 19:05:05 +0200

> On 26 May 2016, at 17:47, Mark Davis ☕️ <mark_at_macchiato.com> wrote:
>
> The canonical property and property value formats are in the *Alias* files.

Thanks for confirming!

Any chance the canonical names can be used in `Blocks.txt` as well, for consistency? This would simplify scripts that parse the Unicode database text files.

> On 26 May 2016, at 18:03, Ken Whistler <kenwhistler_at_att.net> wrote:
>
> […] "canonical block name" is not a defined term in the standard.

I didn’t mean to imply it was — it’s just an English word. I meant “canonical” as in “without loose matching applied”.

> See the matching rules in UAX #44:
>
> http://www.unicode.org/reports/tr44/#Matching_Rules
>
> and in particular, the matching rule for symbolic values, which applies in this case:
>
> http://www.unicode.org/reports/tr44/#UAX44-LM3

I know about loose matching, having recently implemented it (https://github.com/mathiasbynens/unicode-loose-match).

> For enumerated properties, and especially for catalog properties such as Block and Script,
> the value of the property may be multi-word, and the best form to use in one context might
> not be exactly (as in binary string equality exact) the same as in another.

That makes sense, but shouldn’t it be consistent throughout the Unicode database text files?
Received on Thu May 26 2016 - 12:05:52 CDT

This archive was generated by hypermail 2.2.0 : Thu May 26 2016 - 12:05:52 CDT