Re: Question on U+33D7

From: Asmus Freytag <>
Date: Thu, 23 Feb 2012 17:22:30 -0800

On 2/23/2012 2:44 PM, António Martins-Tuválkin wrote:
> On 2012/2/23 Matt Ma<> wrote:
>> It is defined as
>> "33D7;SQUARE PH;So;0;L;<square> 0050 0048;;;;N;SQUARED PH;;;;"
>> in UnicodeData.txt, but it is shown as "pH" in code chart. Should it be
>> "0070 0048" or "PH"?
> It should certainly be "pH", i.e., "<square>0070 0048</square>",
> because that's the peculiar casing in widespread (universal, really)
> use for this basic Chemistry concept (AFAIK it means "power of
> Hidrogen"). See<>.
> While there's no surprise at "PH" Unicode names being all caps, I’m
> surprised that the decomposition mapping is wrongly set to 0050 0048
> instead of to 0070 0048.

The early fonts and code tables showed this in all caps.

Unfortunately, mappings are frozen - including mistakes.

One of the many reasons not to use NF"K"D or NF"K"C for transforming
data - these transformations should be limited to dealing with
identifiers, where practically all of the problematic characters are
already disallowed.

If your intent is to sort or search a document using "fuzzy"
equivalences, then you are not required to limit yourself to the NF"K"
C/D transformations in any way, because you would not be claiming to be
"normalizing" the text in the sense of a Unicode Normalization Form.

Received on Thu Feb 23 2012 - 19:28:36 CST

This archive was generated by hypermail 2.2.0 : Thu Feb 23 2012 - 19:28:38 CST