Re: Greek characters in IPA usage

From: verdy_p (
Date: Tue Aug 18 2009 - 14:39:30 CDT

  • Next message: verdy_p: "Re: Request change name of (as yet unpublished) 1CD3 VEDIC SIGN NIHSHVASA"

    "Andreas Stötzner" wrote:
    > Last but not least, this is not a question of typographic *geekness*.
    > It’s a systemic issue: are phonetic β, θ and χ the same characters as
    > the Greek β, θ and χ?
    > I think, beyond glyph shaping details, it all comes down to this simple
    > question.

    The question would not be complete without also asking your self if the phonetic d is the "same" as the Latin d.
    By "same" it does not mean that they have equal semantics/meaning, given that they are in fact used in distinct
    contexts (not really the same languages). From this discussion, it becomes now clear that IPA designers really
    wanted that their symbols adopt a style that harmonizes very well with Romane letters. But they immediately created
    exceptions, including restrictions about their "permitted" shapes in Latin where they may have ambiguities.
    So yes, it seems that they created IPA with the intent of being a subset of the Latin script, even if this meant
    that the script had to be extended to cover borderline cases.

    The question of sorting IPA characters is independant. It does not matter if Greek and Latin symbols are sorted in
    separate segments, given that the default sort order in DUCET has absolutely no meaning in phonetic terms: IPA
    should already be ideally sorted in another order, already requiring tailoring for matching near phonetic
    realizations of words that are unified in language-specific phonologies.

    Once you start using collation tailoring, it absolutely does not matter if the symbols belong to distinct scripts
    (given thay all other characters without any defined meaning in IPA will not have to be sorted, or will be sorted
    completely separately, either as encoding errors/ambiguities to be corrected, or as text not related to IPA phonetic
    or phonology.

    So this is already a non-issue for collation : using new separate characters for IPA, or using variation selectors
    will absolutely not prevent UCA collation tailoring to work correctly, including for phonologic cases (where lots of
    realizations are possible and unified language per language).

    Collation, anyway, is not a problem for ISO 10646, and collation tables are not direct components of The Unicode
    Standard (only the UCA algorithm is standardized in both Unicode and a separate ISO standard, and the DUCET is
    partly standardized by reference only, but not the direct subject to the Unicode stability rules). Collation tables
    are localization issues, but they do not affect how texts must be effectively represented and encoded, or how they
    can be given semantics and handled in variou transformation algorithms, or how they must be rendered.

    In addition, even within the same language, collation orders are not unique and adapted for each usage. I don't see
    why this would not be the case for IPA (between "pure" phonetic representations, or in extended academic notations,
    or in language-specific phonologies including dictionnaries).

    If you are still not convinced, you should see that there already exists tools built on top of Wiktionnary that
    allow word searches to be performed phonetically or phonologically, including when searching for rimes, possibly in
    multiple languages simultaneously. These tools work bery well, even if "incorrect" codes are used for some IPA
    symbols, just because the phonetic or phonologic notations found in articles are easily detected by the templates with which they are inserted consistently in articles, and then collated in a distinct database for each language.
    This effectively avoids creating (and painly maintaining manually with lots of corrections) a lot of specific pages
    for rimes, or it can also be used to correct, almost automatically, most encoding errors (with the non-preferred

    This tool made for Wiktionnary absolutely does not matter if you have used, for example, a regular Latin 'g' or the
    IPA-specific and really 'geeky' IPA single-eyed 'g' which has no ambiguity... So this clearly demonstrates that the
    need of disunification between Latin/Greek and IPA is not justified by practical reasons, at least not for searches.
    Note that you may argue that Wiktionnary is not really plain-text (even if the tools that index it is effectively
    loading and indexing articles created and maintained in plain-text only), given that template syntaxes are used and
    detected to add extra semantics to these texts.

    This archive was generated by hypermail 2.1.5 : Tue Aug 18 2009 - 14:41:35 CDT