E0000 Language Tags for Archaic Greek Alphabets

From: UList@dfa-mail.com
Date: Sun Feb 27 2005 - 15:15:15 CST

Previous message: UList@dfa-mail.com: "Re: E0000 Language Tags for Some Obscure Languages"
In reply to: Peter Kirk: "Re: E0000 Language Tags for Some Obscure Languages"
Next in thread: Doug Ewell: "Re: E0000 Language Tags for Archaic Greek Alphabets"
Reply: Doug Ewell: "Re: E0000 Language Tags for Archaic Greek Alphabets"
Maybe reply: Peter Constable: "RE: E0000 Language Tags for Archaic Greek Alphabets"
Reply: Patrick Andries: "Re: E0000 Language Tags for Archaic Greek Alphabets"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Hello,

I've been informed a certain language is *not* to be mentioned!

I apologize. You will see from my post subject title I thought I was dealing
with an "obscure" subject there.

So let me change the subject to what I'm *actually* interested in, Greek
archaic alphabets.

> I'm not sure if you are suggesting a language tag at the start of a
> string of UNMENTIONABLE text or before each UNMENTIONABLE letter.

In that post I was backing off from my love of "single-codepoint tagging" and
trying tagging the entire section of text.

> If the former, this simpy doesn't work with current font technologies,

>From what I've been told about OpenType, it should actually currently be
possible to detect any arbitrary string of codepoints, like

DORIC

and then detect another arbitrary string of codepoints, like

/DORIC

and then do something (glyph swapping) to the codepoints in between them.

This is a "context" detection, something is next to something else, not a
"state" detection.

The person who told me that is someone very much in the know about such
things, but he may have misunderstood what I was asking. So I don't know for
sure that it can be done.

Given the choice of using absolutely any arbitrary string of codepoints for my
"start" and "end" markers, I thought it would be best received by Unicode if I
used the E0000 codepoints. There's actually a specific way defined for using
the E0000 tags as "custom language tags", where you start with the Tag
[Language], and then (I think) the Tag [x] and then the Tag [-] and then the
Tags that spell out your custom language name. Then there's another Tag that
means [End Language].

Thats what I mean with my shorthand

DORIC ---> [LANGUAGE][x][-][D][O][R][I][C]

/DORIC ---> [END LANGUAGE]

I'm using the specific approved E0000 way of doing "custom language tags"
currently, *solely* to go along with how Unicode says to do things, even
though any arbitrary string of codepoints could be used the same way by OT to
detect "context". OT cannot currently detect these E0000 "custom language
tags" *as* "language tags", but it seems conceivable it might be able to do so
in the future. So there is also the benefit of possible forward-compatibility
for documents (as well as keeping Unicode happy).

On a different subject, the OpenType font technology *can* (again I am told)
deal with real, normal language tags such as you would use in XML. The
language tags OT will respond to are created by Microsoft and maintained in a
list on the MS site. You can program the OT font to carry out a set of
instructions when it detects a particular XML language tag.

The Microsoft site says the list of recognized languages is being expanded.
For Ancient Greek though, I don't think this official list is a good way to go
-- even if Microsoft would accept something like "Old Cretan Doric".

The little secret I should let out is that I am using the E0000 "language"
tags, really for "scripts". Although it can be defended that each Ancient
Greek dialect is different enough to be called a "language". As most people on
the list know, there is no scientific line between "dialect" and "language" --
the joke being "a language is a dialect with an army". And of course each
Greek city-state had its own army : )

But I want to go even a little further, in order to implement *all* the
possible variations of archaic Greek scripts in one smart font. I want to define

   OLD_CRETAN_DORIC_LTR (left to right)
   OLD_CRETAN_DORIC_RTL (right to left)
   OLD_CRETAN_DORIC_ALT_LTR (alternate, left to right)
   OLD_CRETAN_DORIC_ALT_RTL (alternate, right to left)

and this, or more, for perhaps 10 different archaic Greek scripts... err, I
mean dialects.

That probably isn't something that could/should end up on Microsoft's official
language list.

And perhaps I should actually ask Unicode for an E0000 Tag [Script] and Tag
[End Script] rather than continuing my fairly poor attempt to masquerade as
dialects. The E0000 documentation says more Tags like [Language] will be
added, and [Script] sounds like a good one.

Doug

Peter Kirk wrote:
>
> On 27/02/2005 17:04, UList@dfa-mail.com wrote:
>
> >...
> >UNMENTIONABLE: use Hebrew transliteration text plus a smart font to swap in
> >UNMENTIONABLE glyphs when the E0000 "UNMENTIONABLE" language tags are encountered.
> >
> >
> >
> Doug, I am infamous on this list for having suggested several different
> alternatives for representing UNMENTIONABLE in Unicode. See the list
> archives, especially for May 2004 - and please don't try to reopen those
> discussions! But this is one suggestion which I did not consider. Why
> not? Simply because it has nothing at all to commend it.
>
> I'm not sure if you are suggesting a language tag at the start of a
> string of UNMENTIONABLE text or before each UNMENTIONABLE letter.
>
> If the former, this simpy doesn't work with current font technologies,
> which are not stateful in the way necessary to support this.
>
> If the latter, I suppose in principle current font technologies could
> support this if the language tag and the letter were treated as a
> multi-character ligature. But it would surely be ruled out by its
> extreme inefficiency, and because the rather similar alternative of
> using a variation selector after each UNMENTIONABLE letter is much more
> efficient but was ruled out for various reasons, including its lesser
> inefficiency, which apply all the more to your solution.
>
> If what you really mean is that you want to use higher level markup to
> distinguish UNMENTIONABLE from other languages by a change of font perhaps
> indicated by a different markup style, with language tags as one
> specific way of doing this markup: Well, that might work, but the UTC
> and WG2 have already rejected the argument that UNMENTIONABLE should be
> distinguised only by a font changed signalled by markup.
>
> --
> Peter Kirk
> peter@qaya.org (personal)
> peterkirk@qaya.org (work)
> http://www.qaya.org/
>
> --
> No virus found in this outgoing message.
> Checked by AVG Anti-Virus.
> Version: 7.0.300 / Virus Database: 266.5.1 - Release Date: 27/02/2005

Next message: vlad: "Hentaigana"
Previous message: UList@dfa-mail.com: "Re: E0000 Language Tags for Some Obscure Languages"
In reply to: Peter Kirk: "Re: E0000 Language Tags for Some Obscure Languages"
Next in thread: Doug Ewell: "Re: E0000 Language Tags for Archaic Greek Alphabets"
Reply: Doug Ewell: "Re: E0000 Language Tags for Archaic Greek Alphabets"
Maybe reply: Peter Constable: "RE: E0000 Language Tags for Archaic Greek Alphabets"
Reply: Patrick Andries: "Re: E0000 Language Tags for Archaic Greek Alphabets"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Feb 28 2005 - 00:21:18 CST