Re: encoding phonetic tone letters

From: Kenneth Whistler (kenw@sybase.com)
Date: Fri Sep 10 1999 - 17:54:23 EDT


Peter,

> Since we're on the topic of phonetic symbols, I've got an idea
> to throw out for feedback. IPA provides for indicating tones
> using tone letters, but only five are defined in Unicode U+02E5
> - 02E9, which represent five levels. A lot more is needed for
> representing tones, however: if you look at the chart of IPA
> symbols at http://www.arts.gla.ac.uk/IPA/fullchart.html you'll
> see five contour letters followed by "etc.". I.e. these are
> only a pattern.
>
> In practice, 5 tone levels cover most situations, but I've
> heard of 7 being needed in some situations.

I am not convinced of this claim for 7. Maybe the analysis has proceeded further
since I was involved, but the tonal experts among phoneticians I
knew insisted that five levels (of tonemes) were sufficient. Further
discriminations are probably just at the level of allotonetic
variations--predictable by context. And at that level of detail,
tone letter transcription is inappropriate anyway; you need numerical
values analyzed out of speech signals--and at that point it gets hard
to demonstrate reliable distinctions for the variation that emerges.

> I have also
> encountered contours that are sequences of up to 3 tones long.

Of course. You need this just to represent Mandarin tone 3, for example,
which is a dipping/rising tone, often represented as a 214 sequence of
the 5 tone levels. There are arguably even rising/dipping/rising tones
in some languages--which would require a sequence of 4 tone letters to
represent.

But the 5 tone letters already in the Unicode Standard are explicitly
intended for use in such combinations already. See pp. 6-12 to 6-13.

> Supposing up to 7 levels and sequences of 1 - 3 in length, the
> possible combinations come to something like 400 in all. Now, I
> don't think any of us are enamoured with the thought of adding
> 400 tone letters to the standard. In place of that, though, 8
> characters for tone letters could be sufficient: 7 to indicate
> tone levels, and 1 to demarcate the begnnings of sequences.

I don't see the need for such a demarcation. Any base letter would
serve to demark a sequence of tone letters. If you just want to
indicate a sequence of tones without base letters, pick any other
symbol as a delimiter. There is no need to invent a special-purpose
parsing demarcation here.

> (Where contours are not involved, the delimiter isn't needed.)
> Given that there are 5 characters already, we'd only be looking
> at 3 more (assuming 7 is really what's needed to cover all
> situations). It would be up to some smart font technology, such
> as AAT or OpenType, to substitute actual contour glyphs for the
> sequences; in the absence of smart fonts, a sequence of level
> tone letters are shown, and the delimiter either appears as a
> small visual delimiter or is zero-width.

Except for the delimiter, this is just what is already intended by
the tone letters that are encoded.

--Ken

>
> Is this an idea worth considering?
>
>
> Peter
>
>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT