Re: transforms and language identifiers (was Re: Dozenal chars in music)

From: Michael Everson (
Date: Wed May 27 2009 - 02:24:36 CDT

  • Next message: Jeroen Ruigrok van der Werven: "Re: Pb with Unicode Tifinagh with Internet Explorer"


    On 27 May 2009, at 02:35, Mark Davis wrote:
    > The API does not actually do that. The API actually returns
    > precisely which one was chosen, so the user has a choice, as I said,
    > of discarding the transform, or using it. So you can ask for "en_GB-
    > ipa". We don't have one available currently, so you would get back
    > "en-ipa". According to the CLDR data, mechanically readable, that is
    > equivalent to *a* en_US-ipa transform. You can at that point simply
    > reject it, and tell your user it there is nothing available, if you
    > judge that it is better to fail than to return a different variant
    > of English than you want.

    It's en_US-fonipa, not en_US-ipa

    >> Of course. But most en-UK speakers accept RP as a reference
    >> standard pronunciation, although they no longer consider it a
    >> normative standard. Likewise people accept GA as an American
    >> reference standard, not a normative standard.
    > I can't speak to the former, but as to the latter; I don't know that
    > the average non-GA American would necessarily consider it a "the
    > reference standard".

    To the degree that he or she could understand the question, yes,
    Johnny Carson's Nebraskan "GA" would be considered so.

    >> I think it's not entirely clear whether UK or US English is viewed
    >> as the reference standard for English, if you're only interested in
    >> numbers. US clearly dominates the native-English-speaking world,
    >> but probably many of the L2 English speakers still think of UK
    >> English as a nearer reference standard than US, especially in those
    >> places where there are many L2 speakers.
    > If you have some hard figures on that it would be useful to consider
    > them.

    Julian has to provide figures but you can just make assertions? Hm.

    >> But a GA transcription has less information than an RP
    >> transcription, so can't be transformed to be right for RP.
    >> Similarly, such an transcription should include all the /r/s, even
    >> those that non-rhotic speakers (e.g. RP) don't pronounce, because
    >> non-rhotic speakers can remove the /r/s, but rhotic speakers can't
    >> insert /r/s that aren't there in the transcription.
    > I don't know that that is really the case.

    What, you think that rhotic speakers can insert /r/s that aren't in
    the transcription? Wrong. /bɑː/ may mean 'the sound a sheep makes' or
    'a place to get a drink'. There is no information there which will
    tell which word gets an /r/ and which does not. On the other hand if /
    bɑː/ and /bɑːr/ are written, the non-rhotic speaker can delete the
    for-him-or-her superfluous /r/. Julian is right.

    > And you can't reliably transform from RP to GA (or the reverse).
    > See, for example,
    > and
    > Wells has, for example,
    > ɑː
    > start, father
    > ɔː
    > thought, law, north, war
    > You couldn't map those reliably to GA, because some are rhotic and
    > some are not. And there are some cases that are even clearer, like
    > "privacy".

    Well, not so much, because even within GA you have /ˌɛkəˈnɔmɪk/
    alongside /ˌikəˈnɔmɪk/. The edge-case does not refute the argument.

    Anyway what Julian was saying was that rhotic transcriptions offer
    more information than non-rhotic. Did you know that there are many
    (many) dialects of European English which are rhotic?

    >> In fact, your system already does some of this: it tranforms
    >> When will Merry Mary marry?
    >> to
    >> wɛn wɪl mɛri meri mæri?
    >> although most Americans don't make the three-way distinction.

    I always did and still do. (Eastern Pennsylvania.) Though my brother
    has lost the distinction between the first two (I blame the dialects
    he was exposed to in the Navy.)

    I have /ʍɛn/ in stressed position too. :-) That's gone throughout
    Britain apart from Scotland, though it is common enough in much if not
    most of Ireland.

    >> All this kind of stuff has of course been considered ad nauseam in
    >> the various proposals for "phonetic" orthographies for English.

    That's why the most sensible arguments for reform are the ones like
    Webster's (which was not very systematic) or Wijk's (whose "hoote" vs.
    "foot" makes very good sense indeed).

    Michael Everson *

    This archive was generated by hypermail 2.1.5 : Wed May 27 2009 - 02:28:17 CDT