From: Jon Hanna (firstname.lastname@example.org)
Date: Thu Feb 25 2010 - 05:26:17 CST
> I read somewhere, and some time ago(*), that the Unicode concept of character matches the common sense of "character" in computing. I find this assertion rather amazing,
It's not just true, it's tautologous. Unicode is a standard for dealing
with the concept of "character" in computing. As such, it inherently
matches the concept of "character" in computing. It is true though that
some things that have been done with characters in computing are not
allowed in Unicode, so it's a proper subset of how character can be
understood in computing; excluding those disallowed techniques.
> For instance, in Unicode, the unit 'â' may be formed out of the bits 'a' and the composing variant of '^'.
It cannot however be formed by 'a' + backspace + '^'. This was how â was
produced by some ASCII-using systems. There are lots of ways one could
possibly make â, but only two of them are allowed in Unicode.
> It seems to me in legacy characters sets scripting bits simply do not exist, but I may be wrong on this.
You are. Windows-1258 has combining characters (though â is precomposed
in it, but there are four combining marks).
This archive was generated by hypermail 2.1.5 : Thu Feb 25 2010 - 05:30:41 CST