Re: Unicode, SMS, PDA/cellphones

From: Donald Z. Osborn (dzo@bisharat.net)
Date: Thu Jun 01 2006 - 20:07:54 CDT

  • Next message: Philippe Verdy: "Re: Unicode, SMS, PDA/cellphones"

    This thread has had some interesting information and exchanges. Some of the
    issues with Romanian are certain to come into play with a number of African
    languages as mobiles become more available (at a rapid pace).

    I wanted to take a moment to summarize a few points as I understand
    them. First
    we're again talking about two overlapping areas of concern: technical and
    linguistic.

    In the technical area, it seems there are several options in cellphone/mobile
    technology (who was it who said that "the nice thing about standards is that
    there are so many to choose from"?) :

    GSM - dominant (?) technology based on 7-bit, but can accommodate Unicode
    (UTF-16) at a cost of message length.

    GSM "optional compressed mode" - "which should work wonders for most
    language-specific alphabetic scripts. It's a bit clunky when having to switch
    between Unicode rows (i.e. high 8 bits of UTF-16), taking at least 9 bits for
    each switch" (Richard Wordingham)
    "Digital cellular telecommunications system (Phase 2+); Compression algorithm
    for text messaging services (GSM 03.42 version 7.1.1 Release 1998). Also known
    as 'ETSI TS 101 032 V7.1.1 (1999-07)'." (Richard Wordingham)

    BOCU - "can usually encode codepoints above 256 in one byte per
    character, and
    it can represent every code point" (Theodore H. Smith)
    "Actually that's not the full story with BOCU-1, because it requires 2
    bytes not
    only to encode a Latin character outside of ASCII but also 2 bytes to
    encode the
    next ASCII character (except space or controls). BOCU-1 works better on text
    that fits within a 128-byte block." (Richard Wordingham)

    SCSU - "allows access to the full Unicode repertoire and encodes most
    Latin-based orthographies ... much more efficiently than GSM" (Doug Ewell - he
    later modified the categorical statement re efficiency)
    (Originally Reuters Compression Scheme for Unicode (RCSU), adapted & expanded
    [Doug Ewell])

    And some technical issues re display and input:

    Keyboards - a familiar issue, but smaller and more complicated

    Display - Cristian Secară notes that one cannot be sure that an extended
    character sent on SMS will display correctly on another cellphone screen;
    Philippe Verdy notes that a "[country/language?] profile indicator set in the
    mobile phone would inform the GSM network about the capability of the mobile
    phone, and then these platforms could handle the conversion of characters to a
    more restricted set supported by the mobile phone"

    Capability detection - "I just wonder why mobile phones do not
    advertize to the
    mobile network their effective capabilities. There are other extensions
    proliferating that are supported only by a small set of mobile phones,
    and this
    absence of capability detection is a real *nuisance* for users, that must
    sometime pay to access to a mobile service that their mobile phone will
    finally
    not be able to render correctly. This is true for MMS (SMS with photo
    or video),
    and more generally for the web or iMode navigation functions."
    (Philippe Verdy)

    The linguistic, or sociolinguistic aspects of user preferences and habits with
    regard to language and SMS (which of course are related also to the technical
    interface) are another area of discussion. What concerns me in this
    area is not
    the evolution of language or the (sometimes annoying) "SMS language" as
    Philippe
    puts it, but that some cellphone companies may not find it worth their
    effort to
    provide support for a fuller range of character & language options based on
    misinterpretation of the ASCII workarounds that people have adopted (for
    reasons of display & keyboard). The tool fits the practice but also shapes it,
    and with better (and better standardized) support for extended and non-Latin
    characters, who's to say how people now and in the near future will
    make use of
    these capabilities?

    Don Osborn
    Bisharat.net
    PanAfrican Localisation Project

    Quoting Philippe Verdy <verdy_p@wanadoo.fr>:

    > From: "Cristian Secară" <orice@secarica.ro>
    >> On Thu, 1 Jun 2006 23:52:53 +0200, Philippe Verdy wrote:
    >>> Given the average lifetime of mobile phones of about 3 years (which
    >>> depends mostly on its battery, one of the most expensive part of it),
    >>> Idon't think it's too late.
    >>
    >> It's to late because users have already habituated with this particular
    >> situation.
    >
    > I really don't think so. This situation is born from deficiencies in
    > the early types of mobile phone. They are improving, and lots of
    > people really hate the "SMS language"; this situation desserves the
    > interests of mobile phone companies that could extend their market by
    > offering better alternative to this "old" usage. (Not very old in
    > fact).
    >
    > When more users will use amore acceptable orthography, because of
    > improvements in the mobile phone input system, the "SMS language"
    > users will become a minority, and let's hope they will convert back
    > to normal language, and will want a better support of their own
    > language.
    >
    > The SMS-language-mania should evolve into a more normal situation
    > where it will be used only in some areas where it doesn't matter
    > much, like for emails and online forums where normal orthographies
    > are now expected and the "SMS language" is banned.
    >
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Thu Jun 01 2006 - 20:10:30 CDT