Re: orthographic characters for glottal stop

From: Michael Everson (everson@indigo.ie)
Date: Wed Sep 08 1999 - 13:57:09 EDT


Ar 10:08 -0700 1999-09-07, scríobh Kenneth Whistler:

>I think you should consider further. First of all, if you are moving
>to Unicode for these, any existing data is going to have to be
>converted.

True.

>Second, the use of 7 was an ugly hack in the first place,
>and it isn't doing anyone any favors to perpetuate it or the other
>digit hacks for letters from typewriter days.

Not always. The Popol Vuh and other early alphabetic documents (ca. 1700
C.E., ca. Long Count 0.0.0.12.19.6.5.0) in Quiché used something which was
(at least later) represented by 3 and 4; recent reforms by Mayanists
replace _4h_ with _ch'_, _4,_ [sic] with _tz'_, _4_ with _k'_, and _3_ wiht
_q'_ (amongst other changes). Now these manuscripts do need UCS
representation; I won't hazard to guess at this point whether LATIN LETTER
FOUR and LATIN LETTER THREE are the best solutions, but I wouldn't rule
them (or LATIN LETTER SEVEN) out ahead of time. Pullum & Ladusaw identify 4
as DIGIT FOUR (p. 15) and 7 as DIGIT SEVEN (p. 212), both with reference to
Mayanists, and don't mention 3.

>An orthography that
>mixes the digits with the letters is just going to continue to run
>into algorithmic problems as it moves into computerized form.

This is certainly true, and it, along with early manuscript evidence was
one reason that LATIN LETTERs OU were added to the standard, though most of
the Canadian users are currently still using the digit 8. There are also
those Latin tone marks used for Zhuang; but all of the required ones were
not added as two (tone 3 and tone 6 I think) were unified with CYRILLIC
LETTER ZE and CYRILLIC LETTER SOFT SIGN. I think that multilingual ordering
will be problematic for these as well, since Latin and Cyrillic do not
interfile. Same goes for the Q and W used in Russian Kurdish, which still
don't aren't represented by CYRILLIC LETTERs KU and WE but rather were
unified with their Latin letters. And there's those IPA Greek letters. I'd
prefer not to have these kinds of unifications in principle.

These kinds of questions are sticky, and the WG2 Procedures document does
include info on disunification criteria, costs, and benefits (thanks
Asmus); but we've yet to hash out how we feel about the examples given
above. Not that it's an easy issue. For Kurdish the case is very strong,
because even language-tagging doesn't help, since Kurdish is written in
both Cyrillic and Latin.

>This doesn't mean going back and burning all the material that was
>printed with the "7", but introduction of a proper glottal stop shouldn't
>be that difficult, if the benefits are explained. (It can even be
>made to look "sevenish" in your fonts, if users insist.)

Aye, there's the rub. It's easy for us to say, well, 7 is really just a
glottal stop. But we're talking natural orthographies here. User
sensitivities, legibility, recognizability, etc. do have to be taken into
account. (The case is stronger for ? being a glyph variant of the glottal
stop.) At the end of the day, 7 is _their_ letter, not ours.

>The difference from the "7" is that that was always known to just be
>a workaround for keyboards that had no glottal stop.

The linguist knows this; the schoolchildren do not, and if they've been
using 7 for years and years.... I mean, what if they regularly write DIGIT
7 with a stroke through it and LETTER 7 without? (I've no idea, it's just a
thought.) The IPA glottal stop isn't the most beautiful of creatures. Is it
just a glyph variant? Does that mean Mayans can't use Lucida off the shelf
since it has the ?-glyph? What if that means they keep typing DIGIT 7? The
algorithmic problems aren't solved for them.

>The same problem
>arises for orthographies that used "?" for a glottal stop. Neither should
>be perpetuated into the future because of the problems such overloading
>will cause.

Agreed that they shouldn't type DIGIT 7. We should think about LETTERs 3,
4, and 7, though, and look at various fonts and things to see if people
encoded them with separate code positions and/or unique shapes to
differentiate the numbers from the letters.

--
Michael Everson * Everson Gunn Teoranta * http://www.indigo.ie/egt
15 Port Chaeimhghein Íochtarach; Baile Átha Cliath 2; Éire/Ireland
Guthán: +353 1 478 2597 ** Facsa: +353 1 478 2597 (by arrangement)
27 Páirc an Fhéithlinn;  Baile an Bhóthair;  Co. Átha Cliath; Éire



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT