Re: Unicode Transliteration Guidelines released

From: Mark Davis (mark.davis@icu-project.org)
Date: Sat Jan 26 2008 - 14:31:50 CST

  • Next message: Mark Davis: "Re: Unicode Transliteration Guidelines released"

    Those are good comments, thanks.

    Some sources insist that "transliteration" requires reversibility, although
    that is commonly not the case. For example (since you cite Wikipedia --
    although I wouldn't take that as a primary source for definitions),
    http://en.wikipedia.org/wiki/Transliteration_of_Russian_into_English has a
    Transliteration Table, although many of the sources given an not reversible.
    And what we have found is that even if *theoretically* a transliteration
    system is supposed to be reversible, it is almost always not specified in
    sufficient detail in the edge cases to *actually* be reversible.

    Personally, I've found it most useful to always use the term "reversible
    transliteration" for clarity.

    CLDR does allow for non-reversible transliterations, although the goal is
    for all the script-script transliterations to be reversible. Note that
    reversibility is generally only in one direction, so for native to latin it
    may be reversible, but not the contrary. For example, Hangul is reversible,
    in that any Hangul to Latin to Hangul should provide the same Hangul as the
    input. Thus

    갗 => gach => 갗

    However, for completeness, many Latin characters have fallbacks, so more
    than one Latin character may map to the same Hangul. Thus while

    gach => 갗 => gach

    Also

    gac => 갗 => gach

    This whole topic does need clarification, so I created a bug for it so the
    committee can discuss it.

    http://www.unicode.org/cldr/bugs/locale-bugs?findid=1596

    You can add any other replies to that bug, to make sure the committee sees
    it.

    Also, you say:

    > The level of knowledge required to interpret such a transliterationphonetically is way greater than for the examples you mention, and unless
    you are quite skilled, you can't expect to be able to reliably work out the
    phonetics of the actual text.

    If I see a non-English word "jaw", *even not in a transliteration*, I only
    have no assurance how it is to be pronounced. The j could be pronounced (for
    an English speaker) as in jump, or Junker, or jour; and so on.
    Transcriptions are only roughly phonetic, and so when you know the rules -
    "chi" from Japanese would be understood by an Italian who didn't know the
    phonetic system involved as English "key". There is a bit of text about that
    in the guidelines, but clearly it needs more explanation -- any suggestions
    would be welcome.

    Mark

    On Jan 21, 2008 7:33 AM, Richard Ishida <ishida@w3.org> wrote:
    > I was surprised to see that this munges together the terms and concepts of
    > transcription and transliteration[1][2]. As I read through the document I
    > kept changing my mind about whether this is about one or the other. The
    > guidelines expend some effort on discussing the needs of reversible text,
    > but don't appear to disallow non-reversible transcriptions, and some of
    the
    > systems you provide for (eg. Korean) provide transcriptions rather than
    > transliterations.
    >
    >
    > (I understand that in a technical sense, transliteration is distinguished
    > from transcription by being reversible, ie. allowing you to exactly
    > reconstruct the original sequence from the transliterated sequence.)
    >
    >
    > I think the document should start out with clear definitions that
    > distinguish these two approaches, since it is very useful to apply the
    term
    > transliteration in a very specific technical sense here.
    >
    > If you are describing guidelines for transcription and/or transliteration,
    I
    > would change the title and early on describe transliteration as a special
    > thing, and make it clearer where guidelines and commentary refer to one or
    > the other. I would also include stronger text about the benefits of each,
    > and how to decide which you want.
    >
    > If you are actually describing a mechanism intended for transliterations,
    in
    > the narrow sense, I suggest that after the para that starts
    "Transliteration
    > is not translation... " You add another para that starts "Transliteration
    > is not transcription..." and at least introduces the key difference
    (though
    > it could point down the page for more details and examples), and then
    > throughout the document be stricter about disallowing any approach that
    > would introduce non-reversibility.
    >
    >
    > RI
    >
    > See Wikipedia definitions:
    > [1] http://en.wikipedia.org/wiki/Transliteration
    > [2] http://en.wikipedia.org/wiki/Transcription_%28linguistics%29
    >
    >
    >
    >
    > ============
    > Richard Ishida
    > Internationalization Lead
    > W3C (World Wide Web Consortium)
    >
    > http://www.w3.org/International/
    > http://rishida.net/blog/
    > http://rishida.net/
    >
    >
    >
    >
    >
    > > -----Original Message-----
    > > From: unicore-bounce@unicode.org [mailto:unicore-bounce@unicode.org] On
    > > Behalf Of Rick McGowan
    > > Sent: 19 January 2008 16:58
    > > To: unicode@unicode.org
    > > Subject: Unicode Transliteration Guidelines released
    > >
    > > The Unicode CLDR committee has released
    > > "Unicode Transliteration Guidelines":
    > > http://www.unicode.org/cldr/transliteration_guidelines.html
    > >
    > > Regards,
    > > Rick McGowan
    > > Unicode, Inc.
    >
    >
    >
    >

    -- 
    Mark
    


    This archive was generated by hypermail 2.1.5 : Sat Jan 26 2008 - 14:33:19 CST