Re: Unicode, SMS, PDA/cellphones

From: Doug Ewell (
Date: Mon May 29 2006 - 12:08:10 CDT

  • Next message: Doug Ewell: "Re: Unicode, SMS, PDA/cellphones"

    Richard Wordingham <richard dot wordingham at ntlworld dot com> wrote:

    > I see there's also an optional compressed mode in GSM, which should
    > work wonders for most language-specific alphabetic scripts.

    Can you provide a reference to that compressed mode? I couldn't find it
    on the page Cristian mentioned, and a very casual search for "GSM
    character set compressed" led me to several descriptions of the standard
    7-bit ASCII-based encoding, and a proposal paper that achieves 5 bits
    per character by splitting uppercase and lowercase ASCII letters into
    "groups" (similar to PTTC 50 years ago) and allowing only 5
    non-alphabetic characters.

    > It's a bit clunky when having to switch between Unicode rows (i.e.
    > high 8 bits of UTF-16), taking at least 9 bits for each switch, which
    > might make it useless for Cree or possibly even for Vietnamese.

    Not surprisingly, Inuktitut and Vietnamese are the two writing systems I
    mentioned in UTN #14 as not being compressed well by SCSU, due to its
    128-byte window-switching approach and the dispersal of the characters
    in those writing systems across multiple windows. A system that uses
    256-byte windows (rows) instead of 128 would have the same problem,
    probably even worse for Vietnamese.

    Doug Ewell
    Fullerton, California, USA

    This archive was generated by hypermail 2.1.5 : Mon May 29 2006 - 12:26:43 CDT