Re: Encoding of Teuthonista: Diacritics in parentheses

From: Kenneth Whistler (kenw@sybase.com)
Date: Thu Oct 30 2008 - 18:38:52 CST

  • Next message: vunzndi@vfemail.net: "Re: Proposal to change the script allocation rules for the BMP and SMP"

    Karl said:

    > KP> When it comes to encoding on Teuthonista:
    > KP>
    http://www.sprachatlas.phil.uni-erlangen.de/materialien/Teuthonista_Handbuch.pdf
    > ...
    >
    > To be a little bit more about Teuthonista:
    > Someone looking at the presented parts may have the impression that
    > there is a "fancy" system which leaves the realm of plain text.

    And I am one of those who has that impression.

    >
    > But in fact, Teuthonista *is* a plain text writing system

    That is an assertion -- not a demonstration.

    Whether or not a Teuthonista transcription can be represented
    as Unicode plain text depends on decisions taken about encoding
    of characters. Currently, Teuthonista clearly *could* be
    represented as structured text using Unicode characters. The
    question rather is whether it is advisable to try to add
    additional complex characters to Unicode so as to make to
    make it feasible to represent Teuthonista transcription
    as Unicode plain text.

    > with a specific
    > and clearly defined set of building blocks which compose to diacritics
    > and letters.

    Having a clearly defined set of building blocks does *not*
    make a system, ipso facto, plain text.

    What seems clear is that Teuthonista is intended as a single-tier
    fine-grained phonetic transcription system. That is not enough,
    in my opinion, to guarantee that it must be representable in
    plain text without structured text conventions, given that the
    *way* it builds diacritics formally departs significantly from
    the intended scope of the Unicode model for Latin diacritics.

    > The resulting set of diacritics is rather limited (about
    > 30), as not every possible diacritic is to be put in parentheses.
    >
    > Teuthonista is to write down the exact pronunciation of German dialect
    > words, e.g. to store them in databases (thus employing a typical plain
    > text application).

    I don't think these conclusions follow. Databases aren't
    "typical plain text applications" in the first place -- and
    even if the simplest designs for handling data corpuses may
    result from assuming that a text field is precisely and
    only containing a plain text representation of some bit
    of data and that data can be displayed with a plain text
    renderer without any intervention or transducing, there is
    nothing at all *necessary* about such a design for a linguistic
    corpus.

    > I understand that this must be clearly pointed out in a proposal.
    >
    > Thus, I need the information I asked for in my previous mail under the
    > assumption that the issue of plain were resolved as positive (even if
    > this assumption may be unproven as yet).

    If the question is only is it better to:

       a) Encode combining parenthetical pair characters, or
       b) Encode combining preformed parenthesis-accent-parenthesis characters
       
    for Teuthonista, then I rather suspect that the UTC would
    reject a) in favor of b). But frankly, I don't think b) is
    advisable, either.

    --Ken



    This archive was generated by hypermail 2.1.5 : Thu Oct 30 2008 - 18:41:26 CST