Re: Romanized Singhala - Think about it again

From: Doug Ewell <>
Date: Wed, 4 Jul 2012 12:54:47 -0600

[removing cc list]

Naena Guru wrote:

> On this 4th of July, let me quote James Madison:

[quote from Madison irrelevant to character encoding principles snipped]

> I gave much thought to why many here at the Unicode mailing list
> reacted badly to my saying that Unicode solution for Singhala is bad.

Unicode encodes Latin characters in their own block, and Sinhala
characters in their own block. Many of us disagree with a solution to
encode Sinhala characters as though they were merely Latin characters
with different shapes, and agree with the Unicode solution to encode
them as separate characters. This is a technical matter.

> Earlier I said the Plain Text idea is bad too.

And many of us disagree with that rather vehemently as well, for many

> The responses came as attacks on *my* solution than in defense of
> Unicode Singhala.

It's not personal unless you wish to make it personal. You came onto the
Unicode mailing list, a place unsurprisingly filled with people who
believe the Unicode model is a superior if not perfect character
encoding model, and claimed that encoding Sinhala as if it were Latin
(and requiring a special font to see the Sinhala glyphs) is a better
model. Are you really surprised that some people here disagree with you?
If you write to a Linux mailing list that Linux is terrible and
Microsoft Windows is wonderful, you will see pushback there too.

Here is a defense of Unicode Sinhala: it allows you, me, or anyone else
to create, read, search, and sort plain text in Sinhala, optionally with
any other script or combination of scripts in the same text, using any
of a fairly wide variety of fonts, rendering engines, and applications.

> The purpose of designating naenaguru@‌‌ as a spammer is to
> prevent criticism.

The list administrator, Sarasvati, can speak to this issue. Every
mailing list, every single one, has rules concerning the conduct of
posters. I note that your post made it to the list, though, so I'm not
sure what you're on about.

> It is shameful that a standards organization belonging to corporations
> of repute resorts to censorship like bureaucrats and academics of
> little Lanka.

Do not attempt to represent this as a David and Goliath battle between
the big bad Unicode Consortium and poor little Sri Lanka or its
citizens. This is a technical matter.

> I ask you to reconsider:
> As a way of explaining Romanized Singhala, I made some improvements to
> Mainly, it now has near the top of each page a
> link that says, ’switch the script’. That switches the base font of
> the body tag of the page between the Latin and Singhala typefaces.
> Please read the smaller page that pops up.

The fundamental model is still one of representing Sinhala text using
Latin characters, and relying on a font switch. It is still completely
antithetical to the Unicode model.

> I also verified that I hadn’t left any Unicode characters outside
> ISO-8859-1 in the source code -- HTML, JavaScript or CSS. The purpose
> of declaring the character set as iso-8859-1 than utf-8 is to avoid
> doubling and trebling the size of the page by utf-8. I think, if you
> have characters outside iso-8859-1 and declare the page as such, you
> get Character-not-found for those locations. (I may be wrong).

You didn't read what Philippe wrote. Representing Sinhala characters in
UTF-8 takes *fewer* bytes, typically less than half, compared to using
numeric character references like &#3523;&#3538;&#3458;&#3524;&#3517;
&#3517;&#3538;&#3520;&#3539;&#3512;&#3495; &#3465;&#3524;&#3517;.

> Philippe Verdy, obviously has spent a lot of time researching the web
> site and even went as far as to check the faults of the web service
> provider, He called my font a hack font without any proof
> of it.

A font that places glyphs for one character in the code space defined
for a fundamentally different character is generally referred to as a
hack (or hacked) font. A Latin-only font that placed a glyph looking
like 'B' in the space reserved for 'A' would also be a hacked font.

> As for those who do not want to think rationally and think Unicode is
> a religion, I can only point to my dilemma:

You need to stop making this "religion" accusation. This is a technical

This is the last attempt I will make to help show YOU where the water

Doug Ewell | Thornton, Colorado, USA | @DougEwell ­
Received on Wed Jul 04 2012 - 13:56:16 CDT

This archive was generated by hypermail 2.2.0 : Wed Jul 04 2012 - 13:56:16 CDT