Non-characters in Unicode data files

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon Dec 29 2003 - 10:41:50 EST

  • Next message: Doug Ewell: "Re: German 0364 COMBINING LATIN SMALL LETTER E"

    I note that the UCD contains lines for PUAs like this:

    ..
    E000;<Private Use, First>;Co;0;L;;;;;N;;;;;
    F8FF;<Private Use, Last>;Co;0;L;;;;;N;;;;;
    ..
    DB80;<Private Use High Surrogate, First>;Cs;0;L;;;;;N;;;;;
    DBFF;<Private Use High Surrogate, Last>;Cs;0;L;;;;;N;;;;;
    ..
    F0000;<Plane 15 Private Use, First>;Co;0;L;;;;;N;;;;;
    FFFFD;<Plane 15 Private Use, Last>;Co;0;L;;;;;N;;;;;
    100000;<Plane 16 Private Use, First>;Co;0;L;;;;;N;;;;;
    10FFFD;<Plane 16 Private Use, Last>;Co;0;L;;;;;N;;;;;

    But why isn't there lines for the _assigned_ Private Local-Use characters in
    the Arabic compatibility block, like:

    ..
    FDD0;<Private Local-Use, First;Cn;0;L;;;;;N;;;;;
    FDEF;<Private Local-Use, First;Cn;0;L;;;;;N;;;;;
    ..

    which seem related and used only for local processing of contextual forms,
    and not restricted to local rendering of Arabic ?

    For now, even if it's specified in the text of the standard, it does not
    clearly shows that these characters are assigned but invalid in all versions
    of Unicode, unlike other missing code-points which may be assigned later and
    should not be considered as invalid.

    Other non-characters are also absent from the file (which does not contain
    in fact any "Cn" characters), and I wonder why they are not listed:

    ..
    FFFE;<Illegal, First>;Cn;0;L;;;;;N;;;;;
    FFFF;<Illegal, Last>;Cn;0;L;;;;;N;;;;;
    ..
    1FFFE;<Plane 1 Illegal, First>;Cn;0;L;;;;;N;;;;;
    1FFFF;<Plane 1 Illegal, Last>;Cn;0;L;;;;;N;;;;;
    ..
    2FFFE;<Plane 2 Illegal, First>;Cn;0;L;;;;;N;;;;;
    2FFFF;<Plane 2 Illegal, Last>;Cn;0;L;;;;;N;;;;;
    ..
    3FFFE;<Plane 3 Illegal, First>;Cn;0;L;;;;;N;;;;;
    3FFFF;<Plane 3 Illegal, Last>;Cn;0;L;;;;;N;;;;;
    ..
    4FFFE;<Plane 4 Illegal, First>;Cn;0;L;;;;;N;;;;;
    4FFFF;<Plane 4 Illegal, Last>;Cn;0;L;;;;;N;;;;;
    ..
    5FFFE;<Plane 5 Illegal, First>;Cn;0;L;;;;;N;;;;;
    5FFFF;<Plane 5 Illegal, Last>;Cn;0;L;;;;;N;;;;;
    ..
    6FFFE;<Plane 6 Illegal, First>;Cn;0;L;;;;;N;;;;;
    6FFFF;<Plane 6 Illegal, Last>;Cn;0;L;;;;;N;;;;;
    ..
    7FFFE;<Plane 7 Illegal, First>;Cn;0;L;;;;;N;;;;;
    7FFFF;<Plane 7 Illegal, Last>;Cn;0;L;;;;;N;;;;;
    ..
    8FFFE;<Plane 8 Illegal, First>;Cn;0;L;;;;;N;;;;;
    8FFFF;<Plane 8 Illegal, Last>;Cn;0;L;;;;;N;;;;;
    ..
    9FFFE;<Plane 9 Illegal, First>;Cn;0;L;;;;;N;;;;;
    9FFFF;<Plane 9 Illegal, Last>;Cn;0;L;;;;;N;;;;;
    ..
    AFFFE;<Plane 10 Illegal, First>;Cn;0;L;;;;;N;;;;;
    AFFFF;<Plane 10 Illegal, Last>;Cn;0;L;;;;;N;;;;;
    ..
    BFFFE;<Plane 11 Illegal, First>;Cn;0;L;;;;;N;;;;;
    BFFFF;<Plane 11 Illegal, Last>;Cn;0;L;;;;;N;;;;;
    ..
    CFFFE;<Plane 12 Illegal, First>;Cn;0;L;;;;;N;;;;;
    CFFFF;<Plane 12 Illegal, Last>;Cn;0;L;;;;;N;;;;;
    ..
    DFFFE;<Plane 13 Illegal, First>;Cn;0;L;;;;;N;;;;;
    DFFFF;<Plane 13 Illegal, Last>;Cn;0;L;;;;;N;;;;;
    ..
    EFFFE;<Plane 14 Illegal, First>;Cn;0;L;;;;;N;;;;;
    EFFFF;<Plane 14 Illegal, Last>;Cn;0;L;;;;;N;;;;;
    ..
    FFFFE;<Plane 15 Illegal, First>;Cn;0;L;;;;;N;;;;;
    FFFFF;<Plane 15 Illegal, Last>;Cn;0;L;;;;;N;;;;;
    ..
    10FFFE;<Plane 16 Illegal, First>;Cn;0;L;;;;;N;;;;;
    10FFFF;<Plane 16 Illegal, Last>;Cn;0;L;;;;;N;;;;;

    I think that, if these codepoints are effectively permanently assigned as
    invalid, these assignments should be listed.

    Another solution would be to list these non-characters in
    DerivedCoreProperties.txt



    This archive was generated by hypermail 2.1.5 : Tue Dec 30 2003 - 13:58:52 EST