From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Tue Dec 21 2004 - 09:48:46 CST
RE: Is it roundtripping or transfer-encodingFrom: Lars Kristan
> OK, so it introduces a multiple representation of the same codepoints.
> Every escaping technique does that. And it is not a problem. All you need
> to do is define the normalization procedure. And use it where it applies.
> In many cases its use is not even necessary. Specifically, a Unicode
> system does not need to (and should not) normalize the escape codepoints.
> The need for normalization only needs to be determined for an application
> that uses the TES itself, and applies only in few cases.
Please don't use the term "normalize" in this context. Normalization in
Unicode involves transformation of the stream of *code points*, but is
independant of their encoding form or encoding scheme. Normalization is
exposed in terms of combining sequences and mostly the "combining class"
property of characters and the character composition mapping property (plus
some values of the "general category" property, to take control characters
into account when delimiting combining sequences).
Unicode defines only 4 *standard* normalization forms (NFC, NFD, NFKC,
NFKD), but other *non-standard* normalization forms are possible:
Normalization involves transformation of strings of abstract characters that
should be considered "equivalent" for text processing (notably for input
text, but normalization may apply optionally and less importantly for output
text of these processes).
Unicode defines two sets of equivalence classes for encoded texts:
"canonical" equivalence (NFC or NFD, or the non-standard special
decomposition form used on MacOS for HFS+ volumes), important for some other
important standards depending on Unicode, and "compatibility" equivalence
(NFKC, NFKD), each equivalence type defined with "composed" and "decomposed"
forms, important only for fallback mechanisms (but compatibility mappings
can involve loss of some information in the source text).
This archive was generated by hypermail 2.1.5 : Tue Dec 21 2004 - 11:57:33 CST