Unicode, SMS and year 2012

From: Cristian Secară <orice_at_secarica.ro>
Date: Fri, 27 Apr 2012 11:06:23 +0300

Few years ago there was a discussion here about Unicode and SMS
(Subject: Unicode, SMS, PDA/cellphones). Then and now the situation is
the same, i.e. a SMS text message that uses characters from the GSM
character set can include 160 characters per message (stream of 7 bit ×
160), whereas a message that uses everything else can include only 70
characters per message (stream of UCS2 16 bit × 70).

Although my language (Romanian) was and is affected by this
discrepancy, then I was skeptical about the possibility to improve
something in the area, mostly because at that time both the PC and
mobile market suffered about other critical language problems for me
(like missing gliphs in fonts, or improper keyboard implementation).

Things evolved and now the perspectives are much better. Regarding the
SMS, at that time Richard Wordingham pointed that the SCSU might be a
proper solution for the SMS encoding [when it comes to non-GSM

Recently I studied as much aspects as I could about the SMS
standardization, in a step that I started approx a year ago regarding
the SMS language discrimination just because of the difference in
message length and cost over a same sentence written with diacritical
marks (written correctly for that language) or without diacritical
marks (written incorrectly for that language). Or, for the same reason,
language discrimination between (say) a French message and (say) a
Romanian message, both written correctly.

It turned out that they (ETSI & its groups) created a way to solve the
70 characters limitation, namely “National Language Single Shift” and
“National Language Locking Shift” mechanism. This is described in 3GPP
TS 23.038 standard and it was introduced since release 8. In short, it
is about a character substitution table, per character or per message,
per-language defined.

Personally I find this to be a stone-age-like approach, which in my
opinion does not work at all if I enter the message from my PC keyboard
via the phone's PC application (because the language cannot always be
predicted, mainly if I am using dead keys). It is true that the actual
SMS stream limit is not much generous, but I wonder if the SCSU would
have been a better approach in terms of i18n. I also don't know if the
SCSU requires a language to be prior declared, or it simply guess by
itself the required window for each character.

Apparently the SCSU seems to be ok for my language, or Hungarian, or
Bulgarian, etc., but is this ok also for non-Latin and non-Cyrillic
scripts ? This versus the language shift mechanism, which is still 7
bit. Release 10 of that standard includes language locking shift tables
for Turkish, Portuguese, Bengali, Gujarati, Hindi, Kannada, Malayalam,
Oriya, Punjabi, Tamil, Telugu and Urdu.

Is there someone with more experience on this ?

Thank you,

Cristian Secară
Received on Fri Apr 27 2012 - 05:02:03 CDT

This archive was generated by hypermail 2.2.0 : Fri Apr 27 2012 - 05:02:04 CDT