Re: Arabic encoding model

From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Sun Jul 03 2005 - 12:58:41 CDT

Next message: Peter Kirk: "Re: Greek curled beta in Unicode code chart"

Previous message: suzanne mccarthy: "Re: Measuring a writing system "economy"/"accuracy""
In reply to: asadek@st-elias.com: "Arabic encoding model (was Re: Arabic 16-bit encodings)"
Next in thread: asadek@st-elias.com: "Re: Arabic encoding model"
Maybe reply: asadek@st-elias.com: "Re: Arabic encoding model"
Maybe reply: asadek@st-elias.com: "Re: Arabic encoding model"
Maybe reply: asadek@st-elias.com: "Re: Arabic encoding model"
Maybe reply: asadek@st-elias.com: "Re: Arabic encoding model"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Asadek St. Elias asked:

> Why have combining marks and precomposed Arabic characters using these
> combining marks (see the SMALL V for instance)? Why have encoded any of
> the new precomposed Arabic characters? I thought this was contradictory
> with Unicode's policy of encoding new precomposed characters (e.g. U+0756
> introduced in Unicode 4.1) when it may be composed from an already encoded
> base character and a sequence of one or more combining marks? Why encode,
> version of Unicode after version of Unicode, new Arabic characters which
> could be coded as a base and a combining mark (why no THREE DOTS ABOVE/TWO
> DOTS ABOVE)?

> Is there any reason to this apparent mess?

How about initial ignorance of just how many of these combinations there
were?

The Latin script is partly helped by the tradition that vowels might be
combined with any accent, though in many fonts the combinations are pretty
poor. The concept that the a base Arabic form could be combined with any
combination of distinguishing dots (or other marks) wasn't formed, and would
now be stymied by the 'stability pact' that requires that anything that is
now in Normal Form Composed or Normal Form Decomposed remain so for ever.
Also, should one allow 'dotless noon + combining two dots above' when its
initial and medial forms would clash with 'teh' (U+062A)? I say dotless
noon, but perhaps one would make normal 'noon' (U+0646) 'soft dotted'. That
concept, for Latin 'i' and 'j', is itself quite recent.

So far as I am aware, there is no way of composing such new consonants.

Richard.

Next message: Peter Kirk: "Re: Greek curled beta in Unicode code chart"
Previous message: suzanne mccarthy: "Re: Measuring a writing system "economy"/"accuracy""
In reply to: asadek@st-elias.com: "Arabic encoding model (was Re: Arabic 16-bit encodings)"
Next in thread: asadek@st-elias.com: "Re: Arabic encoding model"
Maybe reply: asadek@st-elias.com: "Re: Arabic encoding model"
Maybe reply: asadek@st-elias.com: "Re: Arabic encoding model"
Maybe reply: asadek@st-elias.com: "Re: Arabic encoding model"
Maybe reply: asadek@st-elias.com: "Re: Arabic encoding model"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sun Jul 03 2005 - 13:04:17 CDT