Date: Mon Aug 11 2003 - 11:07:46 EDT
Hey thanks - I think I've got all that now.
Of course, I'm tempted to wonder whether or not it would have made more
sense to simply have introduced a few new combining characters in plane 0,
such as: "make bold", "make italic", "make script", "make fraktur", "make
double-struck", "make sans serif", "make monospace" and "make tag". This
would not only have achieved the same effect (and with the same space
requirements too, at least for things like "bold uppercase A" in UTF-16),
but with much greater flexibility (in that you could also make _other_
characters bold too, and you could create combinations of the attributes not
I still haven't figured out what "fullwidth" means though. I don't really
understand in what way a "full width full stop" (FF0E) is different from a
"full stop" (002E), etc. I _have_ downloaded, and read in entirety, the code
chart document for FF00-FFEF, and nothing in that document explains to me
why these characters are necessary. Does anyone have any clues on that one?
From: John Cowan [mailto:firstname.lastname@example.org]
Sent: Monday, August 11, 2003 12:26 PM
Subject: Re: Newbie Question - what are all those duplicated characters
> Stefan has effectively dealt with SOME of my confusion, but questions
> remain. For example: between 1D49C (mathematical script capital A) and
> 1D49E(mathematical script capital C) we find 1D49D (<reserved>). What is
> reserved for? I am aware that codepoint 212C is script capital B, but why
> does that justify leaving a "hole" in the codepoint space? Why not just
> "mathematical script capital B" without leaving a hole? (i.e. why not just
> go straight from A to C?).
Primarily for implementation simplicity. It's possible to convert between
any of the mathematical "fonts" and any other, or the corresponding "normal"
ones, with a simple offset plus a short table of exceptions.
Code space on plane 1 just isn't that precious. Similar things have
been done throughout Unicode: for example, in the main Greek block,
there is a hole where "capital letter final sigma" would be, since there
is no such character: the final/non-final distinction is not made in
> More questions. From E0020 to E007E we have "tag space" through to "tag
> tilde". These are copies of the Basic Latin block at 0020. I still don't
> know what they are for.
The tag characters are used to embed tags, specifically language tags,
in contexts where markup is too heavyweight but it seems essential to
record the language of a text. One such application is in protocol
design, where it is occasionally necessary to pass around human-readable
strings within the protocol, and it is desirable to supply the correcdt
string for a given language. All other uses are strongly discouraged.
But if you have to do it, you can encode "en-us" (the language code for
U.S. English) using <E0001, E0065, E006E, E002D, E0075, E0073>. For
all purposes other than language identification, tag characters are
-- John Cowan email@example.com www.ccil.org/~cowan www.reutershealth.com "In computer science, we stand on each other's feet." --Brian K. Reid
This archive was generated by hypermail 2.1.5 : Mon Aug 11 2003 - 12:39:22 EDT