(long) Re: Hexadecimal characters.

From: Doug Ewell (dewell@adelphia.net)
Date: Fri Jun 21 2002 - 12:01:35 EDT


James Kass <jameskass at worldnet dot att dot net> wrote:

> <a bit tongue-in-cheek>
>
> Perhaps the Ewellic forms should be used rather than risk the
> possibility of being perceived as ASCII-centric?
>
> http://www.evertype.com/standards/csur/ewellic.html
>
> All we'd need to do is wait for Doug Ewell to provide the glyphs for
> hexadecimal digits ten through fifteen and wait for CSUR to assign
> code points other than the former Shavian block.

Great Scott, that's TWO people now (besides Michael and John) who've
read my ConScript proposal. When do the book offers begin rolling in?

John, where did you get those glyphs for A-F in your message Thursday?
Is there's some "Indiana Jones"-like connection that I'm missing, or are
they original? I've got half a mind to adapt them for Ewellic. (Of
course, for a project like that, half a mind is plenty.)

BTW, the Ewellic block is in the process of being moved from U+E700 (the
former Shavian block) to U+E650. No word yet on whether Shavian will be
restored until Unicode 4.0 comes out.

> As for input, these could be entered the same way any other Unicode
> character is entered. Likewise for handling legacy conversions.
>
> </a bit tongue-in-cheek>

Watch all the Keyman devotees start designing keyboards for Ewellic now.

In case anyone is wondering why Tengwar and Cirth and Klingon (and
Ewellic) have any business being in a published PUA registry while hex
digits don't, the answer is not that Big Powerful Corporations have thus
ordained it, but that -- as several have said by now -- there is already
a perfectly good way to represent hex digits in Unicode. The characters
0-9, A-F, and a-f are so widely used to fill this need that to introduce
a new block would only cause problems with interoperability,
interchange, keyboard input, etc. (And don't forget the spoofing issue
when the "c" in <www.microsoft.com> turns out to be a hex digit for 12.)

Tom Finch and others might be interested in knowing that a proposal very
similar to the current hex-digit proposal came up in the early days of
Unicode/10646. The proposal is cited as an example in WG2 document
N2352R, "Principles and Procedures for Allocation of New Characters and
Scripts and handling of Defect Reports on Character Names." A lengthy
title, but I wish everyone on this list would read this document:

    http://std.dkuug.dk/JTC1/SC2/WG2/docs/n2352r.pdf

<quote>
F.4 Some Examples of Precedents

Example 1:

Character: Generic Decimal Separator Mark

In 1991 the proposal was made to add a new punctuation character in the
General Punctuation block that would have the semantic property of
decimal separator, but could be imaged as period, comma, space or
apostrophe depending on the locale.

Asserted benefit: Solve the locale dependent display of numbers.

Costs: This new character would have disunified four widely used
characters. Mapping from existing character sets would have become
locale dependent. Users would have to turn on a special
show-invisible-character mode to distinguish the new character from
existing characters. Such modes exist, but are limited to word
processing software, where numbers usually occur embedded in text, which
in turn is 'frozen' into a given language. Database software, where
locale dependent numeric displays are much more of an issue, does not
normally need or support a show-invisible-character mode. Finally, in
1991 there were no keyboards supporting this new character, but it would
be needed in all languages and applications, and all software would have
to be specially adapted for it.

Alternatives: There already is an established technology to deal with
locale differences, and in a way that is not limited to decimal numbers.

Result: Rejected. The costs far outweigh the benefits.
</quote>

The hex issue is interesting, though, because of ISO 14755. This is a
standard which deals with methods of inputting arbitrary Unicode
characters on an ordinary keyboard. One of the methods mentioned is to
go into a special hex-entry mode (possibly by holding down two or more
control keys) and typing the Unicode code point in hexadecimal. Now,
obviously you would never want to type a whole lot of (e.g.) Chinese
text this way. It would make more sense to find a real Chinese IME and
learn to use it. However, if all you want to do is enter a single
Chinese character, the hex method does make it *possible* with an
ordinary U.S. English keyboard.

The problem comes when you try to apply this technique to non-Latin
keyboards. What does "hex" mean in non-Latin scripts? ISO 14755 says
the values 10 through 15 are to be represented using "the first 6
letters of the Latin alphabet if the Latin script is used, or the first
six letters of any other alphabet if a different script is used." But
what *are* the first six letters of other alphabets? Markus Kuhn
brought up this topic on the Unicode list a few years ago, and I
resurrected it a few months ago. The general opinion seemed to be that
this was a non-issue since "everybody" has access to Latin letters on
their keyboard, but we know this to be false. Microsoft's collection of
keyboard layouts at their Global Development site, for example, shows
lots of keyboards with no Latin-script allocations, such as Russian,
Arabic, and Hindi.

Of course you can't just use the first six characters of a given script
in Unicode code point order. That wouldn't even work for Greek or
Cyrillic. I personally think the characters to be used for 10 through
15 should be explicitly spelled out in ISO 14755 itself.

So why shouldn't we solve this problem once and for all with a newly
assigned, unambiguous set of hexadecimal digits? After all, they could
be imaged in the user's native script, just like the proposed decimal
separator could be imaged according to the user's locale. But that
woudn't change the fact that these new characters would not appear on
any existing keyboard, and no manufacturer would be likely to allocate
16 new keys for them. Ironically, if you wanted to use the dedicated
hex digits to enter a Unicode character with ISO 14755, you would have
to use ISO 14755 to enter the hex digits themselves! Hmmm, no, I don't
think this is going to work....

-Doug Ewell
 Fullerton, California



This archive was generated by hypermail 2.1.2 : Fri Jun 21 2002 - 10:37:27 EDT