From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Mon Dec 29 2003 - 14:59:08 EST
Philippe Verdy wrote:
> I note that the UCD contains lines for PUAs like this:
> ...
> E000;<Private Use, First>;Co;0;L;;;;;N;;;;;
> F8FF;<Private Use, Last>;Co;0;L;;;;;N;;;;;
> ...
> But why isn't there lines for the _assigned_ Private Local-Use characters in
1. No one saw a need to include them?
2. The documentation file points out that Cn entries are not included:
http://www.unicode.org/Public/UNIDATA/UCD.html#General_Category_Values
3. See DerivedAge.txt which I point out below.
> the Arabic compatibility block, like:
> ...
> FDD0;<Private Local-Use, First;Cn;0;L;;;;;N;;;;;
> FDEF;<Private Local-Use, First;Cn;0;L;;;;;N;;;;;
> ...
> which seem related and used only for local processing of contextual forms,
> and not restricted to local rendering of Arabic ?
I think it is a legitimate question why the block boundaries were not adjusted to exclude this
non-character range from FB50..FDFF; Arabic Presentation Forms-A (see Blocks.txt).
However, the Unicode standard only points these out as generic non-characters, not for any
particular purpose like "local processing of contextual forms".
> For now, even if it's specified in the text of the standard, it does not
> clearly shows that these characters are assigned but invalid in all versions
> of Unicode, unlike other missing code-points which may be assigned later and
> should not be considered as invalid.
Unicode 3.1 (http://www.unicode.org/reports/tr27/) clarified their usage. See "3.1 Conformance
Requirements (revision)" and then the heading "Noncharacters" a page or so below, including the
definition D7b Noncharacter. See the equivalent parts of Unicode 4.
Noncharacters are not "invalid", but they are "designated" and can therefore not be reassigned:
http://www.unicode.org/alloc/CurrentAllocation.html
I personally find useful the chart for [91-C31] Consensus in
http://www.unicode.org/consortium/utc-minutes/UTC-091-200205.html
> Other non-characters are also absent from the file (which does not contain
> in fact any "Cn" characters), and I wonder why they are not listed:
> ...
See my quote above from UCD.html
> I think that, if these codepoints are effectively permanently assigned as
> invalid, these assignments should be listed.
>
> Another solution would be to list these non-characters in
> DerivedCoreProperties.txt
Well, they are listed in http://www.unicode.org/Public/UNIDATA/DerivedAge.txt
If you search for "noncharacter" there, you will find which ones were designated in which Unicode
version. (Only two were designated in Unicode 1.)
Best regards,
markus
-- Opinions expressed here may not reflect my company's positions unless otherwise noted.
This archive was generated by hypermail 2.1.5 : Mon Dec 29 2003 - 15:32:54 EST