From: Gregg Reynolds (email@example.com)
Date: Mon Jul 11 2005 - 20:01:22 CDT
Asmus Freytag wrote:
> At 03:26 PM 7/11/2005, Peter Kirk wrote:
>> In fact I think Gregg started this thread with a bad example. The two
>> encodings for a with circumflex are canonically equivalent and so
>> different encodings of the same data. The cases Gregg really needs to
>> deal with are when the alternatives are not canonically equivalent but
>> semantically distinct.
It was a great example! I just didn't make myself clear. ;) I meant
it as a graphic design problem, not as a practical problem to be solved.
> I'm still waiting for an actual (or correctly contrived) example.
Ok, you asked for it. Here's an example taken from my own little
speculative semantic encoding design for Arabic. Soon to be inflicted
on an innocent world.
The letterform waw U+0648 has at least four distinct functions in
1. waw-rad. latin1 translit: W; phono: consonant /w/; semantics:
radical; e.g. Wjd وجد /wajada/; shows up in the dictionary under the
2. waw-nonrad. latin-1 translit: w; phono: consonant /w/; semantics:
non-radical; e.g. bwâdr بوادر /bawâdir/; shows up under b-d-r, the waw
is ignored for (first-level) lexical lookup.
3. sister of damma. latin-1 translit: û; phono: short vowel /u/;
semantics: non-lexical (it can change meanings within a lexical
category, though, e.g. from active to passive voice, etc); e.g. mktûb,
مكتوب /maktoob/; like damma, does not affect lexical ordering (except
within subentries under the root k-t-b); mnemonic: called sister of
damma because it always comes after damma (which may not be written
explicitly) and denotes a lengthening of the vowel /u/.
4. lazy waw. latin-1: o; phono: null; semantics: null; e.g. bo's
بؤس/bu's/ where ' is hamza; purely graphotactic; mnemonic: too lazy to
bear the burden of phonological or lexical meaning; too lazy to grow the
tail that would make it look like a real waw.
Ok, so now we have four different encoding elements. BTW, they don't
have to map to single codepoints. My scheme maps them to latin-1, for
the transliteration. They could be mapped to PUA points, or to XML
elements. In any case, they all have the same typographic denotation,
namely waw U+0648. But you probably would have a hard time writing
software that could automatically check spelling/encoding. So you need
a font with four almost but not quite identical waw glyphs. I think.
For example, lazy waw might use a small subfixed ring or null sign.
This archive was generated by hypermail 2.1.5 : Mon Jul 11 2005 - 20:03:22 CDT