What things are called (was Non-ascii string processing)

From: Jill Ramonsky (Jill.Ramonsky@aculab.com)
Date: Tue Oct 07 2003 - 04:20:09 CST


Sigh! Things were a lot easier back in the old days of Unicode version
3, when default grapheme clusters were still called "glyphs". Okay, so
the general public still got it wrong, but that was just because they
were ignorant monkeys who didn't know any better, and it was up to the
likes of us to teach them the right words for things. :-) Now, instead,
we'll have to teach them to say "default grapheme cluster". How long do
you think it will be before it will be acceptable to describe a console
or terminal emultator as being "80 default grapheme clusters wide and 25
default grapheme clusters high"? If I had to guess, I'd say ... never.

Of course, a default grapheme cluster is exactly what Johann was trying
to represent in 64 bits in his Excessive Memory Usage Encoding. It's
unfortunate that 64 bits just isn't enough for this purpose.

It would be a whole lot easier if Unicode types would only use the same
words for things as the rest of the world. I suggest:
(1) A codepoint is still called a codepoint. No problem there.
(2) The object currently called a "character" be renamed as something
like "mapped codepoint" or "encoded codepoint", or possibly (coming in
from the other end) something like "sub-character" or "character
component" or "characterette" (which can be shortened to "charette" and
pronounced "carrot". :-) )
(3) The object currently called a "default grapheme cluster" be renamed
as "character".
(4) The object currently called a "tailored grapheme cluster" be renamed
as "tailored character"

This would make even /our/ conversations a lot less confusing.
Jill



This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST