Re: GSM and Unicode

From: Philippe Verdy (
Date: Tue Nov 04 2003 - 15:17:34 EST

  • Next message: Philippe Verdy: "Re: UTF-16 inside UTF-8"


    > I am looking at the GSM 03.38 specification
    > GSM 03.38 Version 5.3.0 July 1996
    > GSM 03.38 Version v7.0.0 July 1998-07
    > specification about GSM 03.38 default alphabet-
    > Any one know can tell me is there any place I can find
    > out all the details about which cell phone model does
    > support UCS2 SMS Data Coding Scheme in additional to
    > the "default" alphabet? And what is the character set they
    > support ? Latin1 ? MES-1 or MES-2 ? Thanks

    I have read that the European standard defined MES-1 as a simple extension
    of the basic LL8 set in which one icelandic letter and the euro symbol were
    added to the core set of characters sets coming from ISO 8859, Windows ANSI
    codepages, PC/DOS OEM codepages and Macintosh 8 bit sets, including Greek
    and Latin, but excluding Hebrew and Arabic despite they were also used for
    some minority European languages.

    The intent was to create a subset that could easily be incorporated into
    small devices, and the GSM standard adopted this initial subset, initially
    encoded with a ITU technic (i.e. with diacritics coded before the base

    GSM charsets are mostly from MES-1, but GSM phones now prefer using a UTF-8
    scheme based on Unicode, where extensions are added depending on the
    national markets where they are deployed (so support for basic Arabic or
    basic Hebrew is optional but now comes frequently on phones from Nokia,
    Siemens, Alcatel, Sony, Motorola, only with the Unicode scheme, not in the
    ITU encoding scheme).

    Almost all European GSM service providers now support the Unicode scheme, as
    it offers a better international interoperability, however the MES-1 subset
    is certainly the minimum level supported on both the GSM ITU encoding
    scheme, or the Unicode UTF-8 scheme.

    Phone manufacturers all need to adapt their phone to national needs, but for
    Europe the MES-1 subset is the minimum set required. If a phone model must
    be sold in Japan or US, it has to be adapted to that market. As GSM 900 and
    DCS 1800 are European standards, and as US and Japan use another standard,
    additional subsets may be added, possibly by making some sacrifice on some
    MES-1 characters then reduced from some scripts (notably Greek and Cyrillic)
    to allow including support for Japanese models. In all events, all phones
    should support the Latin1 set, independantly of the encoding scheme actually
    used. But the manufacturer origin of the phone will be important.

    Phones are much less constrained now than with the past and slow SMS system.
    Almost all models now support MMS on faster data networks (GPRS now,and 3G
    networks later when they will start being deployed). For MMS, they need
    faster processors, more memory, and larger internal softwares, so the need
    to restrict the character set is less critical than it was. But the key
    factor will be interoperability between phones and markets. So the situation
    for now is a mix of MES-1, and Basic Japanese (Katakana/Hiragana). I don't
    know what support these phones offer for Chinese. If those phones exist they
    probably only work in China or Japan.

    This archive was generated by hypermail 2.1.5 : Tue Nov 04 2003 - 16:09:12 EST