From: Hans Aberg (haberg@math.su.se)
Date: Sun Jan 30 2005 - 13:14:10 CST
At 18:24 +0000 2005/01/29, Jon Hanna wrote:
>Hey, the surrogates aren't even the most illogical part of the model by a
>long shot, the characters with single character canonical decompositions are
>much worse.
The numbers 0xD800-0xDFFF, 0xFFFE-0xFFFF are not associated with character,
but included as place holders, never to be used, because one has failed to
give the encoding UTF-16 a proper design. So an unrelated problem, choice of
character encoding, is allowed to influence the logical core, the character
set description.
The other problem you mention is clearly a problem of describing character
properties. So, no matter how complicated, it belongs to the character set
description. Mathematically, though, one just defines an equivalence
relation on the set of character sequences with a preferred equivalence
class representative.
Hans Aberg
This archive was generated by hypermail 2.1.5 : Sun Jan 30 2005 - 13:17:47 CST