Re: Non-characters in Unicode data files

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon Dec 29 2003 - 16:26:31 EST

  • Next message: Jim Allan: "Re: Ancient Northwest Semitic Script"

    > Well, they are listed in
    http://www.unicode.org/Public/UNIDATA/DerivedAge.txt
    > If you search for "noncharacter" there, you will find which ones were
    designated in which Unicode
    > version. (Only two were designated in Unicode 1.)

    Thanks, I forgot to check this file, which was introduced later to allow
    applications certified to comply to a particular version to be used with
    later versions (for example to allow emulation of previous versions using
    default character properties and absence of decompositions or case mappings
    with non compliant texts that were encoded with unassigned code points).

    As I don't need support for non compliant texts, I simply dropped this
    reference. If this text file is really normative, then it means that texts
    using unassigned code points should be rejected in compliant applications
    and not accepted silently, not even by using default properties, which may
    break in a later version of Unicode...

    You also use the term "designated", isn't it also saying that it is
    "assigned" (for example codepoints are assigned to surrogates, even if they
    are not characters, and the FAQ already states that not all assigned code
    points are characters.



    This archive was generated by hypermail 2.1.5 : Mon Dec 29 2003 - 17:01:02 EST