On Wed, 15 Dec 1999, Gregg Reynolds wrote:
> So a search for, e.g., kitAbu, should find kitAb(un), where
> (un) symbolizes U+064C; that is, 'u' modified by tanween (noonation).
> And a search for kitAb(un) should arguably find kitAbuN, where 'N'
> symbolizes the as-yet undefined tanween codepoint, as well as kitAbuu,
> where two consecutive damma marks makes a damma+tanween. But unless
> I've misread the standard (entirely possible), there is nothing in
> Unicode that provides for this. Furthermore, it isn't clear how one
> should encode tanween in a text. Does e.g. U+064C suffice? Or should
> one inscribe the vowel followed by the tanween mark - U+064E, U+064B? I
> submit there should be a mapping from each tanween codepoint to the
> combination of vowel mark and proper tanween mark. Which means Unicode
> needs a new "tanween" codepoint, and the current composed tanween
> "characters" should be defined as compositions of vowel mark + tanween.
> One could also argue that a pair of (identical) vowel marks should be
> interpreted as vowel+tanween, since that is after all the semantics.
I'm not sure if a "GENERAL" tanwin helps. It's the definition of character
that is somehow dependent on the user. If you are an Englishman, you may
prefer writing German a" with two keystrokes, and considering it two
characters, but German people prefer it as one character, with one
There are legacy usages. Your addition will help sorting, but will
increase the confuses. There exists well-defined legacy usage of tanwin
characters, and I think when a people needs both "fatha" and "fathatan",
can search with regular expressions, or something equivalent to English
"ignore case" searches. ADDING THOSE WILL REALLY COMPLICATE
IMPLEMENTATION. And remember that you are suggesting this out of logic,
which is not always in the same direction of usage.
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:56 EDT