From: Andrew C. West (email@example.com)
Date: Wed Jan 21 2004 - 08:36:54 EST
On Tue, 20 Jan 2004 10:32:06 -0700, John Jenkins wrote:
> 1) U+9CE6 is a traditional Chinese character (a kind of swallow)
> without a SC counterpart encoded. However, applying the usual rules
> for simplifications, it would be easy to derive a simplified form which
> one could conceivably see in a book printed in the PRC. Rather than
> encode the simplified form, the UTC would prefer to represent the SC
> form using U+9CE6 + a variation selector.
If a simplified form of a given CJK ideograph is used, then it deserves encoding
properly. There are newly-coined simplified forms in CJK-B and CJK-C, so why not
add newly used simplified forms to CJK-C or whereever if they are really needed
? To borrow Michael's term, this use of variation selectors is simply
If a Chinese publishing house were going to print a book in simplified
characters that included a simplified form of U+9CE6, would they go the lengths
of applying to Unicode to define an appropriate standardised variant for U+9CE6,
and then trying to create a font that implemented variation selectors ? Or would
they simply use a font that mapped a simplified glyph form to U+9CE6 (or the
PUA) ? If it is so important to formally define the existence of a simplified
form of an existing character, then why not encode it properly ??
> 2) Your best friend has the last name of "turtle," but he doesn't use
> any of the encoded forms for the turtle character to represent it. He
> insists on writing it in yet another way and wants to be able to
> include his name as he writes it in the source code he edits. The UTC
> ends up accommodating him using U+2A6C9 (which is the closest turtle to
> his last name) + a variation selector.
1. Unicode Design Principle 3 : "The Unicode Standard encodes characters, not
This is simple glyph variant. I insist on writing the "A" in my name with two
cross-bars. Will the UTC kindly accommodate me by providing an appropriate
standardised variant for U+0041 ? (In fact, come to think of it I have
idiosyncratic ways of writing all of the letters in my name ...)
The plain fact of the matter is that the *character* turtle is already encoded,
and if someone wants to use a different glyph form for this character then he or
she should design their own font with the appropriate glyph mapped to U+9F9C or
2. Unicode does not encode private-use characters.
I can't find chapter and verse for it, but I was always under the impression
that Unicode did not encode private-use characters.
> 3) You're editing a critical edition of an ancient MS, and you find
> that your author, who talks a lot about handkerchiefs, uses U+5E28
> quite a bit, but varies between the "ears-in" form and the "ears-out"
> form almost at random. Rather than lose the distinction which *may* be
> meaningful, you (with the UTC's blessing) use U+5E28 for the ears-in
> form (as Unicode uses) and U+5E28 + a variation selector for the
> ears-out form.
This example actually opens up the biggest can of worms.
As someone who has a passion for transcribing ancient manuscripts, in Chinese
and other scripts, I fully appreciate the desire to be able to represent every
little idiosyncrasy of a manuscript or inscription in plain text Unicode. But
the simple fact of the matter is that you can't. My apologies for repeating
myself, but Unicode Design Principle 3 states that "The Unicode Standard encodes
characters, not glyphs." (and Section 2.2 of TUS elaborates on this statement).
Unless Unicode becomes a Glyph Encoding Standard instead of a Character Encoding
Standard, then how on earth can the UTC allow VSs to be used for simple glyph
variants ? And if it's OK for CJK ideographs, then why not for every other
Unicoded script ?
Glyph variations are of paramount interest to textual scholars and epigraphers
of all scripts, not just Chinese. To take a random example from the Celtic
Inscribed Stones Project (CISP), this is a palaeographgic description of a cross
slab at Kirk Maughold in the Isle of Man, inscribed [--]I IN CHRISTI NOMINE
CRUCIS CHRISTI IMAGENEM :
Kermode/1907, 112: `we have here the diamond-shaped O, the N like an H, and the
M like a double H, all characteristics of the Hiberno-Saxon manuscripts and
sculptured stones of the period. Other characteristic forms are the
square-shaped C and the peculiar G, the like of which I have not seen elsewhere.
But some of the letters are minuscules, as p, d, b, r, and a; while in the
contraction for CHRISTI, in each case the R differs from the ordinary small R in
CRUCIS, representing, in fact, the Greek Rho!'.
If we go down the road of encoding epigraphic and palaeographic glyph variants
for CJK and other scripts I'm afraid that we'll soon find that 256 Variation
Selectors just isn't enough.
This archive was generated by hypermail 2.1.5 : Wed Jan 21 2004 - 10:11:30 EST