From: Peter Jacobi (peter_jacobi@gmx.net)
Date: Sun Nov 09 2003 - 13:59:52 EST
Dear List Members,
I understand that characters of different scripts, with
equal appearance are dis-unified and have different
Unicode codepoints, Latin E vs Greek U+0395 vs
Cyrillic U+0414 a typical example.
I also understand that characters of one script having
equal shapes in some fonts only, e.g. 0 and O are clearly
not a candidate for sharing a Unicode codepoint.
Now I'm wondering about Tamil LLA (U+0BB3) and
Tamil AU Length Mark (U+0BD7). They not only have
incidental equal shapes in the Font used for preparing
the Unicode charts, they are also indistinguishable in
handwritten Tamil, typewriter Tamil etc, I am told.
So for all purposes:
U+0B95 U+0BCC which is canonically equivalent to
U+0B95 U+0BC7 U+0BD7
looks exactly the same as
U+0B95 U+0BC7 U+0BB3
Isn't that a bit odd?
Giving an analogy using Latin script,
that would be the same as if Latin y U+0079
in vocalic and consonantic use were
mapped to two different Unicode
codepoints.
Regards,
Peter Jacobi
-- NEU FÜR ALLE - GMX MediaCenter - für Fotos, Musik, Dateien... Fotoalbum, File Sharing, MMS, Multimedia-Gruß, GMX FotoService Jetzt kostenlos anmelden unter http://www.gmx.net +++ GMX - die erste Adresse für Mail, Message, More! +++
This archive was generated by hypermail 2.1.5 : Sun Nov 09 2003 - 14:29:10 EST