Re: Canonical block names: spaces vs. underscores

From: Mathias Bynens <>
Date: Thu, 26 May 2016 20:48:48 +0200

> On 26 May 2016, at 20:07, Ken Whistler <> wrote:
> Well, let's take an example. The entry in Blocks.txt for the Arabic Presentation Forms-A block is:
> FB50..FDFF; Arabic Presentation Forms-A
> The entry for that block in PropertyValueAliases.txt is:
> blk; Arabic_PF_A ; Arabic_Presentation_Forms_A ; Arabic_Presentation_Forms-A
> So then which would it be? Should Blocks.txt be changed to the long preferred alias:
> FB50..FDFF; Arabic_Presentation_Forms_A
> or to the abbreviated preferred alias:
> FB50..FDFF; Arabic_PF_A
> which would be more consistent with the XML attribute and with most regex usage?

This sounds like a strawman argument (?). The long preferred alias definitely seems more suitable for a ‘canonical’ name.

> I suppose a proposal to the UTC to further modify the UCD handling of block names
> could change this situation. But I'm not convinced that we shouldn't just leave
> things as they stand -- for stability. And then live with the complications required
> for scripts or other parsing algorithms that actually need to deal with Blocks.txt to
> either parse out block ranges (its main function) or to get usable block names
> (its subsidiary function).

Perhaps the “Note:” in the commented header in `Blocks.txt` could be extended to point out that the ~~canonical block names~~, nay, ++preferred block aliases++ are listed in `PropertyValueAliases.txt`? That would’ve been enough to avoid the question that spawned this thread.
Received on Thu May 26 2016 - 13:49:08 CDT

This archive was generated by hypermail 2.2.0 : Thu May 26 2016 - 13:49:08 CDT