Re: Level of Unicode support required for CJKV

From: James Kass ([email protected])
Date: Fri Oct 26 2007 - 22:57:09 CDT

Next message: James Kass: "RE: Use of acronyms (was RE: purl.net/net/cp)"

Previous message: [email protected]: "Re: Level of Unicode support required for CJKV"
Maybe in reply to: James Kass: "Re: Level of Unicode support required for CJKV"
Next in thread: [email protected]: "Re: Level of Unicode support required for CJKV"
Reply: [email protected]: "Re: Level of Unicode support required for CJKV"
Reply: [email protected]: "Re: Level of Unicode support required for CJKV"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

John Knightley wrote,

>> The difference and similarity between radicals 72 and 73 are
>> reflected as Unification Pattern No. 68 on this beta page:
>> http://kanji-database.sourceforge.net/housetsu.html
>
>The page is a beta page and not mature, flag/pattern No 68 is one that
>is IMHO wrong pattern 68 will probably be drepreciated or removed in
>the future

In addition to noting that this is a beta page, we also note that
flag/pattern isn't a rule. It's only a flag/marker/pattern.

(It is my understanding that) these flags are generated by
machine with the intent that anything flagged be checked
by a human being.

Because radicals 72 and 73 have the same essential shape and
are confusable, and because IDS accompanying proposed new
characters may come from various sources, I think it is a
good flag/pattern. Even though most everything flagged
under pattern number 68 would not be unifiable, it might
catch a duplicate submission which would otherwise be missed
until it is too late.

But, of course, you are right in saying that radical 72 and
radical 73 aren't unifiable.

I'm very much indebted for the help you (and Andrew West,
John H. Jenkins, and others) have given me with respect
towards understanding CJK unification in this thread and
in the past.

Because of my approach, I'm inclined to think that where two
separate Unicode characters could be printed using the same
piece of metal type, those characters would be interchangeable.
If someone hands you a small piece of paper with a single CJK
character hand-written on it and asks you for the Unicode
number for that character, it should be possible to give an
unambiguous answer. When someone is using a radical/stroke
look-up utility to find a certain character, they would tend
to stop as soon as they found a character identical in appearance
with the one sought.

There's also the issue of optical character recognition software
which must deal with these confusables. If the O.C.R. software
finds a visual exact match and presents it for review to the
person initializing the software, it's going to look on-screen
exactly like it looked on the scanned original. So how would
this person know whether the character selected by the
software was correct? A sophisticated O.C.R. system might
anticipate this and present all confusables in a fashion which
would enable the user to select the appropriate character,
I suppose.

Best regards,

James Kass

Next message: James Kass: "RE: Use of acronyms (was RE: purl.net/net/cp)"
Previous message: [email protected]: "Re: Level of Unicode support required for CJKV"
Maybe in reply to: James Kass: "Re: Level of Unicode support required for CJKV"
Next in thread: [email protected]: "Re: Level of Unicode support required for CJKV"
Reply: [email protected]: "Re: Level of Unicode support required for CJKV"
Reply: [email protected]: "Re: Level of Unicode support required for CJKV"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Oct 26 2007 - 22:59:53 CDT