Re: Purpose of plain text (WAS: Re: combining: half, double, triple et cetera ad infinitum) from Doug Ewell on 2011-11-15 (Unicode Mail List Archive)

From: Doug Ewell <doug_at_ewellic.org>
Date: Tue, 15 Nov 2011 11:06:06 -0700

Eli Zaretskii <eliz at gnu dot org> wrote:

> Naena Guru didn't say Unicode was selling something. The intended
> meaning of "You sell something" in the context of the OP is quite
> clear, even though English is not my first language.

He did say "Unicode was created for a commercial reason, particularly
for the benefit of its directors," and later, "Making solutions should
not solely for making solutions for getting rich." Many of us have been
involved with Unicode -- not as directors or corporate executives, but
as developers and other professionals in the field -- with no hope or
expectation of financial gain, except maybe to avoid wasting money
pointlessly on ad-hoc i18n solutions that are difficult to understand,
explain, implement, deploy, and extend.

>> How can I search a group of documents, one written in Devanagari and
>> another in Sinhala and another in Tamil and another in Oriya, for a
>> given string if they all use the same encoding, and the only way to
>> tell which is which is to see them rendered in a particular font?
>
> I don't get it: how can you do that in English or French or German?
> Not even different fonts will tell you which is which. You simply
> need to know the language, period.

That's not what I meant. Because English and French and German use the
same script[1], someone who knows even a tiny subset of these languages
can tell, given a long enough, non-pathological sample, which is which.
You could not similarly tell that a document which appears to contain
इस वाक्यांश was actually supposed to be in Sinhala,
if only you were
using the "right" font to view it.

Of course, if we were relying on font hacks, I couldn't write इस
वाक्यांश in this plain-text e-mail message to begin
with, could I?

[1] If one wishes to argue that Devanagari and Sinhala and Tamil and
Oriya are all the same script, then see my other post; there is no point
in trying to "sell" this argument to the Unicode list.

Elsewhere:

>>> FWIW, the latest Firefox 8 has no problems displaying that page,
>>> including the labels on the tabs.
>>
>> I'm running Iceweasel 8, and it displays the tabs as Latin. I would
>> consider it a bug to do otherwise; the font on those tabs should be
>> under my control.
>
> It _is_ under your control. But what do you expect to happen if you
> select a font that doesn't cover the characters on the page, or force
> the browser to use an encoding different from what the page author
> intended? Selecting a font that cannot handle the tricks played by
> that page is no different.

Of course one must use a font that supports Sinhala in order to see
Sinhala characters. That's true for any character encoding.

But with Unicode, if you don't have one of the many available Sinhala
fonts, you'll see a series of boxes, and if your rendering engine
doesn't know how to render Sinhala, you'll see dotted circles and out-
of-order glyphs. None of this is nearly as bad as seeing a completely
different alphabet, most especially Latin, instead of the correct
Sinhala. This is the point David was making with his üÔÏ
ÓÏÏÂÝÅÎÉÅ
Ñ×ÌÑÅÔÓÑ ÒÕÓÓËÉÊ ÑÚÙË example.

--
Doug Ewell | Thornton, Colorado, USA | RFC 5645, 4645, UTN #14
www.ewellic.org | www.facebook.com/doug.ewell | @DougEwell

Received on Tue Nov 15 2011 - 12:11:05 CST

This archive was generated by hypermail 2.2.0 : Tue Nov 15 2011 - 12:11:06 CST