RE: Phoenician

From: Peter Constable (petercon@microsoft.com)
Date: Thu May 06 2004 - 15:00:23 CDT


Dean:

> Here are the polar choices for XML:
>
> TAGGED (but not encoded)...

> ENCODED (but not tagged)...

Note that tagging can be used as well as distinct encoding.

> The tagged version is not a "font minefield". On the contrary, it
> explicitly provides an international standard mechanism for a level of
> specification and refinement not possible via encoding. You can, for
> example, do things like: <Phn subscript="Punic" locus="Malta"
> font="Maltese Falcon">BT 'LM</Phn>. In fact, this is precisely the
sort
> of thing for which XML was designed.

It *is* a minefield, because the correct interpretation of the text is
dependent on particular fonts being on the recipients' systems. That
fails the criterion of plain text legibility.

 
> The untagged, but differently encoded version, on the other hand, IS a
> search and text processing quagmire, especially when confronted by the
> possibility of having to deal with multiplied West Semitic encodings,
> e.g., for the various Aramaic "scripts" and Samaritan.

Again, I find I have to disagree. It is much easier in searching to
neutralize a distinction that to infer one. And, as has been stated, if
there are distinct encodings, a given researcher can still use common
indexing for their data if that suits their purpose.

As has been stated, the distinct needs of two communities can be served
well with two encodings; it is much more difficult to serve the distinct
needs of a second group if the distinct things they want are merged into
what the first group uses.

 
> Obviously there is a need, in many cases, to maintain the distinction
> between the various diascripts; the question is where should that
> distinction be introduced - at the encoding level or higher? ...

> But, what I'm afraid of with this proposal, as I've stated before, is
> that its adoption will set a precedent that will result in a
snowballing
> of West Semitic encodings,

All I have said is that I'm persuaded that something distinct should be
encoded -- at the character encoding level, not in markup. I have no
opinion on what or how many the new distinct things should be.

> * Separately encode Phoenician, Old Hebrew, Samaritan, Archaic Greek,
Old
> Aramaic, Official Aramaic, Hatran, Nisan, Armazic, Elymaic, Palmyrene,
> Mandaic, Jewish Aramaic, Nabataean ...

I don't think anybody is looking for that many distinctions to be made.

Peter
 
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division



This archive was generated by hypermail 2.1.5 : Fri May 07 2004 - 18:45:26 CDT