Re: Unicode, SMS and year 2012

From: David Starner <>
Date: Sun, 29 Apr 2012 23:04:23 -0700

On Sat, Apr 28, 2012 at 6:22 PM, Naena Guru <> wrote:
> How I see Unicode is as a
> set of character groups, 7-bit, 8-bit (extends and replaces 7-bit), 16-bit,
> and CJKV that use some sort of 16-bit paring.

That's one lens to see Unicode through, but in most cases it's
substantially distorting. Unicode is a set of 1112064 characters,
divided up into a flat section of 55,296 characters, a break of 2048
non-characters, and then another 1,054,720 characters. There's a
number of other ways to view it, but there's no guarantee that U+0370
won't be filled with an Egyptian hieroglyph, and any view of Unicode
that assumes that it won't, is thus not a correct view.

> As Unicode says, they are just
> numeric codes assigned to letters or whatever other ideas. It is the task if
> the devices to decide what they are and show them

That is the concept of a character encoding. It has continued to exist
since the first days of computing because plain text seems to encode
something important and distinct from higher levels.

> It shows perfectly when 'dressed' with a
> smartfont.

Except in IE, one of the most common browsers on the market. Except to
anyone using a screen reader.

> It takes about half the bandwidth to transmit that the double-byte set.

Who cares. SMS's restrictions are not technical ones. G.711, the most
common digital compression for telephony, uses 8 kb per second.* One
byte per character or two, that's faster then you can type. Outside
telephony, plain text is trivial; long novels, like Dracula, come in
at under a MB, and download instantaneously for me--partially because
it's automatically gzipped down to 330 KB. At 3 bytes per Even on
not-so-good connections the time taken to download a full novel is
nowhere near the time needed to read it, and is always a fraction of
time needed to download a song, and is less than 1% of the time needed
to download a TV show. is 4 kb of text and 8 kb of images. The
costs you're trying to impose on everyone to save 4 kb just aren't
worth it, especially as you're sending 177 kb of font to avoid it.

* Before anyone starts to mention kb = kilobytes, yes, 64 kilobits /
sec = 8 kb / sec.

> In the small market of Singhala, no font is present that
> goes typographically well with Arial Unicode. There is no incentive or money
> to make beautiful fonts for a minority language like Singhala.

I'm sorry; unfortunately, that's what's known as a Hard Problem. There
is nothing any character encoding can do about that.

> I hope both the mobile device industry and the PC side separate fonts and
> characters and allow the users to decide the default font sets in their
> devices.

It'd be nice, but that doesn't have much to do with Unicode.

>This is eminently rational because the rendering of the font
> happens locally, whereas the characters travel across the network.

I don't see the connection. The font is almost always local, whether
or not it's user-selectable.

> This will
> also help those who like me who understand that their language is better
> served by a transliteration solution than a convoluted double-byte solution
> that discourages the natives to use their script.

I see no evidence that using an industry-standard solution that treats
all scripts equally discourages people from using the script. I do
think that "Please get a browser that keeps with times" discourages

Kie ekzistas vivo, ekzistas espero.
Received on Mon Apr 30 2012 - 01:11:18 CDT

This archive was generated by hypermail 2.2.0 : Mon Apr 30 2012 - 01:11:25 CDT