RE: PUA

From: Kent Karlsson (kentk@cs.chalmers.se)
Date: Mon Oct 20 2003 - 10:05:48 CST


...
> For non-PUA characters, I already implemented this using Unicode's
> "General Category" property: I decided that all characters whose General
> Category is "L*" are "letters".

Nit: That isn't quite true (but I'm not doubting your choice). The
HANGUL * FILLER characters aren't letters, even though they are of
GC Lo. Indeed, they are even invisible (but the Jamo ones are needed
for representing isolated letters using Jamos in the adopted architecture
for Hangul in Unicode; the non-Jamo Hangul fillers are there just
for compatibility with an older standard, nothing lettery about them).
Nor are LAO ELLIPSIS and THAI CHARACTER PAIYANNOI letters,
though Lo. They are really punctuation.

> My default assumption about PUA characters is that they are not letters.

Hmm. A common default seems to be to treat them as CJK. Non-PUA
CJK is Lo... (Except for radicals, which are So.) Granted, I'm not too
fond
of that default myself. The situation is a bit similar for Braille, where
the
"glyphs" are given, but nothing much else.

                /kent k





This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST