RE: Invalid char display (was: Using hex numbers considered a geek attitude)

From: Carl W. Brown (cbrown@xnetinc.com)
Date: Sat Apr 28 2001 - 12:12:56 EDT


Frank,

I had forgotten about the H19. My son & I built one. When I no longer
needed it, I put it by my youngest son's crib to play with. Could have
something to do with his current technical bent.

The Fujitsu format would have been better for 0000 to FFFF but it requires a
sophisticated font engine to compose a single glyph from 4 hex characters.

Unicode 1234 is displayed as:

 +--+
 |12|
 |34|
 +--+

This worked very well for Japanese systems because it was a centerline glyph
with the same size and proportions as a Kanji character.

Your proposal is intriguing because with additional planes we need 4 to 6
hex digits. Also with the extended planes there is little excuse not to
give up 256 slots. It also makes it much easier to display without taxing
the font engine. One can use bitmap fonts.

 +--+ +--+ +--+ +--+ +--+ +--+ +--+ +--+ +--+ +--+--+
 |0 | |0 | |0 | ... |0 | |1 | |1 | |1 | ... |E | |F | ... |F |F |
 | 1| | 2| | 3| | F| | 0| | 1| | 2| | F| | 0| | E| F|
 +--+ +--+ +--+ +--+ +--+ +--+ +--+ +--+ +--+ +--+--+

However you proposal needs to be modified. You will have to insert at least
a thin space between characters. You will have a varying number of hex
digits and the spaces will not only separate the hex digits but make the
text more readable. If the hex string is too long you will loose count of
the starting character positions making it very unreadable.

Thus 1234 123456 and FEFF become:

 +--+--+ +--+--+--+ +--+--+
 |1 |3 | |1 |3 |5 | |F |F |
 | 2| 4| | 2| 4| 6| | E| F|
 +--+--+ +--+--+--+ +--+--+

Note the boxes are only for position orientation and will actually display
something like:

  1 3 1 3 5 F F
   2 4 2 4 6 E F

With kerning, the top row will only be slightly offset from the bottom row
so the it will be in between what is shown above and the representation
below:

  13 135 FF
  24 246 EF

The top digit will also be closer to the bottom digit (almost touching) so
that the hex digits can be as large as possible for the point size. If the
font can render \u0089, \u0082 & \u0083 with any kind of clarity, you should
be able to render these glyphs as well. You can also squeeze the digits
slightly laterally without reducing too much readability to make the hex
grouping come close to fitting 4 hex digits in an em space (if the width
were about 2/5 em). I am not sure that they can be quite that narrow. This
lateral squeeze would roughly correspond to the lateral offset between the
upper and lower digits.

I think that you should revise your proposal for Unicode 3.1 support and
resubmit. I suspect that it was probably rejected because 256 characters
in those days was a lot. Now that is not true.

If this is not approved then an alternate request is to add A-F to the
Superscript/Subscript table. Superscripts would be 2090 to 2095 and
subscripts would be 2096 to 209B.

The above encoding would be:

\u0089\u2082\u0083\u2084\u2009\u0089\u2082\u0083\u2084\u2075\u2086\u2009\u20
95\u209A\u2095\u209B\u2009

This would not render as well but it would take fewer characters. This
would ironically produce the same size print string in UTF-16. You will have
a character per hex digit instead of a pair of surrogate characters for each
pair of hex digits.

Carl

-----Original Message-----
From: Frank da Cruz [mailto:fdc@columbia.edu]
Sent: Friday, April 27, 2001 11:57 AM
To: Carl W. Brown
Cc: unicode@unicode.org
Subject: Re: Invalid char display (was: Using hex numbers considered a
geek attitude)

> There is a character set missing from Unicode. Unicode needs a special
hex
> display font.
>
Unicode and fonts are two different things. However, I agree it would be
nice
to have a repertoire of characters whose glyphs are hex values, and proposed
this a couple years ago:

  ftp://kermit.columbia.edu/kermit/ucsterminal/hex.txt

but after a round of discussion the proposal was rejected by the UTC.

> When I worked on Fujitsu systems they had a special way of displaying
> characters that were not in the font. They had an extended DBCS system
> called JEF and when you attempted to display a character that was not in
the
> font the hex value was displayed:
>
> XX
> XX
>
> The hex digits were half size so that they would display in the same size
as
> a kanji character. Although you could not see the character, you could
send
> a screen shot to a tech and get real help. You could also look the
> character up in a book to see what the character was if it was really
> important.
>
I proposed a repertoire of 256 hex "byte pictures". You can't have one
for every character in Unicode without doubling the size of Unicode. The
argument about displaying unknown characters in hex is that it has nothing
to
do with Unicode -- it's simply a display option at the application level.

Other possible reasons for having hex byte pictures are given in the
proposal.

- Frank



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:16 EDT