On Fri, 10 Sep 1999, Michael Everson wrote:
> >IPA has got a LATIN SMALL LETTER SCRIPT G as well. The corresponding
> >character, LATIN SMALL LETTER NONSCRIPT G, (or similar) has been unified
> >with LATIN SMALL LETTER G, in a similar demonstration of brokenness :
> >where two glyphs that are used of variants of one grapheme, and used
> >contrastitively in other situations, have been encoded as two code points,
> >rather than the needed three.
> By the same token we'd need three for I, DOTLESS I, and DOTTED I for
> Turkish. But no one wants to do this. (Probably because huge amounts of
> Turkish data do something else.)
Yes. That would be needed, but the case is not as strong, as <dotless-i>
is not a common glyph variant of <dotted-i>, and the other purpose, that
of caseshifting, has to be locale-dependent and nontrivial anyway...
> I support the last three because of the functionality issue of sorting
> Greek and IPA text (in Latin transcription Beta is not supposed to sort
> after z, which it does with the unification). I think the case for the
> first two is dodgy. What do you do about inputting?
Inputting for <hooked-a> would presumably be done in the same way that
inputting <round-a> is currently done. If people still insist on inputting
<a>, then there's nothing that can be done about that, but we should
preferably give people the ability to do things in a sensible way.
> Right now my IPA fonts have a on a and round-a on A.
Yes, this issue can be solved with tagging or markup, but it cannot be
solved that way not in true plaintext. If I extend a font which uses the
<round-a> glyph to cover the IPA range, I have to either
a) make the <a> identical to the <round-a>. This is unacceptable.
b) distinguish between the <round-a> and <a> whilst leaving the <a>
unchanged. This will be confusing to users, who currently expect to
see a <hooked-a> glyph for <a> in IPA contexts. This is unacceptable.
c) make the <a> into a <hooked-a>, and make <round-a> a <round-a>. This
is not something I am happy about, especially if I am trying to keep
the original style of the font intact. It alters the appearance of
normal plaintext to fix IPA text. I am not comfortable with doing
I would like the following option to exist.
d) leave <a> as it is, and make <round-a> and the new <hooked-a>
character appropriate. This has no drawbacks, apart from the
conversion of legacy data.
However, I think it is better that it be fixed now, whilst there
is still not much legacy data. Waiting will only increase the
impracticality of this change.
A similar case can be made for Beta, Theta, and Chi. Another point,
that you not mention, is that of approximate conversions...
If I were writing a filter to convert Unicode to <any-arbritrary
codeset>, with a 'best possible' representation, I would want to
convert greek theta to "th". However, I would not want to convert IPA
theta to "th" : I would want to convert it to "T". This is because "T"
is the ASCII-IPA sign for the voiceless interdental median fricative.
Ditto with "beta" which would want to be converted to either "b/v" or
"B", and "chi" which would want to be converted to "ch/kh" for greek,
or "X" for ASCII-IPA. (This works, as IPA does not use fullsize
uppercase Latin letters)
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT