From: John H. Jenkins (email@example.com)
Date: Fri Oct 26 2007 - 10:54:55 CDT
On Oct 25, 2007, at 11:20 PM, James Kass wrote:
> So, if a rare character has uncertain provenance and meaning, but
> it is unifiable, shouldn't it just be unified?
Ideally, yes. The problem is a reluctance to do a unification where
you don't *know* that it's acceptable to the author of the original
A good case in point came up when South Korea proposed a large set of
characters to be able to encode the Korean tripitaka. It includes a
large number of characters which were *probably* just variants of
other characters but which *may* have been intended to be distinct
characters. In the end, South Korea was convinced that the case for
encoding them as separate characters was weak and withdrew them from
> And, if that character
> is not unifiable, but it exists in texts (however obscure) that
> someone may wish to reproduce electronically (for posterity,
> perhaps), shouldn't it be encoded?
It should be representable, yes. But that representation need not
take the form of a distinct encoding.
> Is it really possible to speed up the process of encoding an
> open-ended set?
Yes, the IRG has taken some steps to do so. Extension D submissions
required IDSs so that a preliminary unification can be done by
computer, TrueType fonts for a better ability to see what the
characters look like, and better provenance information so that we can
have a better sense of whether or not the characters *ought* to be
Moreover, I have on my plate an action item to produce a second set of
variant glyphs. Variant glyphs have a number of advantages for many
of the beasties proposed for encoding. Registering variant glyphs is
a faster process, for one thing, than encoding distinct characters.
It also makes text analysis simpler and -- most importantly -- it
takes these entities off the IRGs plate so it can focus on what
actually *does* need to be encoding.
John H. Jenkins
This archive was generated by hypermail 2.1.5 : Fri Oct 26 2007 - 10:56:28 CDT