Re: Encoding of Teuthonista: Diacritics in parentheses

From: Kenneth Whistler (kenw@sybase.com)
Date: Thu Oct 30 2008 - 18:38:52 CST

Next message: vunzndi@vfemail.net: "Re: Proposal to change the script allocation rules for the BMP and SMP"

Previous message: Erkki I. Kolehmainen: "VS: Encoding of Teuthonista: Diacritics in parentheses"
Maybe in reply to: Karl Pentzlin: "Encoding of Teuthonista: Diacritics in parentheses"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Karl said:

> KP> When it comes to encoding on Teuthonista:
> KP>
http://www.sprachatlas.phil.uni-erlangen.de/materialien/Teuthonista_Handbuch.pdf
> ...
>
> To be a little bit more about Teuthonista:
> Someone looking at the presented parts may have the impression that
> there is a "fancy" system which leaves the realm of plain text.

And I am one of those who has that impression.

>
> But in fact, Teuthonista *is* a plain text writing system

That is an assertion -- not a demonstration.

Whether or not a Teuthonista transcription can be represented
as Unicode plain text depends on decisions taken about encoding
of characters. Currently, Teuthonista clearly *could* be
represented as structured text using Unicode characters. The
question rather is whether it is advisable to try to add
additional complex characters to Unicode so as to make to
make it feasible to represent Teuthonista transcription
as Unicode plain text.

> with a specific
> and clearly defined set of building blocks which compose to diacritics
> and letters.

Having a clearly defined set of building blocks does *not*
make a system, ipso facto, plain text.

What seems clear is that Teuthonista is intended as a single-tier
fine-grained phonetic transcription system. That is not enough,
in my opinion, to guarantee that it must be representable in
plain text without structured text conventions, given that the
*way* it builds diacritics formally departs significantly from
the intended scope of the Unicode model for Latin diacritics.

> The resulting set of diacritics is rather limited (about
> 30), as not every possible diacritic is to be put in parentheses.
>
> Teuthonista is to write down the exact pronunciation of German dialect
> words, e.g. to store them in databases (thus employing a typical plain
> text application).

I don't think these conclusions follow. Databases aren't
"typical plain text applications" in the first place -- and
even if the simplest designs for handling data corpuses may
result from assuming that a text field is precisely and
only containing a plain text representation of some bit
of data and that data can be displayed with a plain text
renderer without any intervention or transducing, there is
nothing at all *necessary* about such a design for a linguistic
corpus.

> I understand that this must be clearly pointed out in a proposal.
>
> Thus, I need the information I asked for in my previous mail under the
> assumption that the issue of plain were resolved as positive (even if
> this assumption may be unproven as yet).

If the question is only is it better to:

   a) Encode combining parenthetical pair characters, or
   b) Encode combining preformed parenthesis-accent-parenthesis characters

for Teuthonista, then I rather suspect that the UTC would
reject a) in favor of b). But frankly, I don't think b) is
advisable, either.

--Ken

Next message: vunzndi@vfemail.net: "Re: Proposal to change the script allocation rules for the BMP and SMP"
Previous message: Erkki I. Kolehmainen: "VS: Encoding of Teuthonista: Diacritics in parentheses"
Maybe in reply to: Karl Pentzlin: "Encoding of Teuthonista: Diacritics in parentheses"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Oct 30 2008 - 18:41:26 CST