RE: Purpose of plain text (WAS: Re: combining: half, double, triple et cetera ad infinitum) from Doug Ewell on 2011-11-14 (Unicode Mail List Archive)

From: Doug Ewell <doug_at_ewellic.org>
Date: Mon, 14 Nov 2011 15:30:00 -0700

Naena Guru <naenaguru at gmail dot com> wrote:

> If it came out as Unicode has its only goal as money making, that is not what I meant to say. Nothing can be such. You sell something for the buyer's benefit, right?

Unicode doesn't sell anything, except (I suppose) printed copies of the
standard and admission to conferences.

> I apologize if you feel hurt over it.

I don't feel hurt. I do feel annoyed about the continued
misinformation.

> However, it is probably the main objective. Who works for nothing except odd crazies like me?

You'd be surprised how many people have volunteered their time and
expertise to help improve Unicode.

> When years back I asked why ligatures formed inside Notepad and not inside Word, I had the clear reply that it is owing to a business decision.

That doesn't mean Unicode is broken. It means that some applications
have support for certain text processes that other applications don't
have. Have you ever seen two graphics editors, one of which has more
capabilities than the other? Does that mean the underlying graphics
format is broken?

> Let me try to clearly say what I want to say:
> 1. Unicode came up with the idea of one codepoint for one letter of any language.

Sort of.

> 2. The justification was that on one text stream you could show all the different languages. At least that is what I understood.

Not just "show." You can "perform text operations on" all the different
characters. Not every Unicode-aware application is required to have
fonts and rendering technology for every character or script. Otherwise
nobody would have adopted it.

> 3. The above 2 is not practical and does not work even now after so many years

There was never a requirement that all applications can display all
scripts perfectly. There has been continuous improvement over the past
20 years toward making this happen. It does not all happen at once.

> 4. Why Indic code pages do not work so well for text processing is not the fault of Unicode but that of the user groups

I assume you mean 8-bit "code pages." Unicode doesn't have "code
pages."

> 5. However, technology arrived at those countries too late to for actual users, not bureaucrats, to understand the mistakes

Can you explain what you feel is wrong with Unicode handling of Indic
text, WITHOUT repeating that not all applications can display everything
perfectly?

> 6. Therefore, I say that there was an undue push by Unicode to complete the standard, by issuing ultimatums for registering ligatures etc.

This is a misrepresentation, and makes no sense.

> Having said all that, all is not so bad. I say transliterate to Latin and make smartfonts. It is a proven success.

How can I search a group of documents, one written in Devanagari and
another in Sinhala and another in Tamil and another in Oriya, for a
given string if they all use the same encoding, and the only way to tell
which is which is to see them rendered in a particular font? That has
been tried before. It is a proven failure.

> I do not understand what you meant by "jury-rigged to accommodate visual display order". Did you mean using unexpected shapes for Latin codes? If you meant that, how do you justify earlier versions of Unicode standard giving specific explanation about codepoints that they do not represent shapes and Fraktur and Gaelic could very well use Latin as their underlying codes?

Latin (Antiqua) and Fraktur and Gaelic letters are, intrinsically, the
same letter. That is not true for Devanagari and Sinhala and Tamil and
Oriya letters.

> I think the ability to use text in the computer in the way you expect text to behave in it is very important. For instance, if you have shape representations mapped to code clusters, scanned text could be more accurately digitized.

Go ahead and design your own encoding, then. It may be of use for niche
applications that care only about display and nothing else.

--
Doug Ewell | Thornton, Colorado, USA | RFC 5645, 4645, UTN #14
www.ewellic.org | www.facebook.com/doug.ewell | @DougEwell

Received on Mon Nov 14 2011 - 16:32:29 CST

This archive was generated by hypermail 2.2.0 : Mon Nov 14 2011 - 16:32:30 CST