RE: Purpose of plain text (WAS: Re: combining: half, double, triple et cetera ad infinitum) from CE Whitehead on 2011-11-15 (Unicode Mail List Archive)

From: CE Whitehead <cewcathar_at_hotmail.com>
Date: Tue, 15 Nov 2011 14:53:31 -0500

Hi, once more.
> From: doug_at_ewellic.org
> To: naenaguru_at_gmail.com
> CC: unicode_at_unicode.org; kusum510_at_gmail.com; cewcathar_at_hotmail.com
> Subject: RE: Purpose of plain text (WAS: Re: combining: half, double, triple et cetera ad infinitum)
> Date: Mon, 14 Nov 2011 15:30:00 -0700
>
> Naena Guru <naenaguru at gmail dot com> wrote:
>
> > If it came out as Unicode has its only goal as money making, that is not what I meant to say. Nothing can be such. You sell something for the buyer's benefit, right?
>
> Unicode doesn't sell anything, except (I suppose) printed copies of the
> standard and admission to conferences.
>
> > I apologize if you feel hurt over it.
>
> I don't feel hurt. I do feel annoyed about the continued
> misinformation.
>
> > However, it is probably the main objective. Who works for nothing except odd crazies like me?
>
> You'd be surprised how many people have volunteered their time and
> expertise to help improve Unicode.
>
> > When years back I asked why ligatures formed inside Notepad and not inside Word, I had the clear reply that it is owing to a business decision.
>
> That doesn't mean Unicode is broken. It means that some applications
> have support for certain text processes that other applications don't
> have. Have you ever seen two graphics editors, one of which has more
> capabilities than the other? Does that mean the underlying graphics
> format is broken?
>
> > Let me try to clearly say what I want to say:
> > 1. Unicode came up with the idea of one codepoint for one letter of any language.
>
> Sort of.
>
> > 2. The justification was that on one text stream you could show all the different languages. At least that is what I understood.
>
> Not just "show." You can "perform text operations on" all the different
> characters. Not every Unicode-aware application is required to have
> fonts and rendering technology for every character or script. Otherwise
> nobody would have adopted it.
>
> > 3. The above 2 is not practical and does not work even now after so many years
>
> There was never a requirement that all applications can display all
> scripts perfectly. There has been continuous improvement over the past
> 20 years toward making this happen. It does not all happen at once.
>
> > 4. Why Indic code pages do not work so well for text processing is not the fault of Unicode but that of the user groups
>
> I assume you mean 8-bit "code pages." Unicode doesn't have "code
> pages."
>
> > 5. However, technology arrived at those countries too late to for actual users, not bureaucrats, to understand the mistakes
>
> Can you explain what you feel is wrong with Unicode handling of Indic
> text, WITHOUT repeating that not all applications can display everything
> perfectly?
>
> > 6. Therefore, I say that there was an undue push by Unicode to complete the standard, by issuing ultimatums for registering ligatures etc.
>
> This is a misrepresentation, and makes no sense.
>
> > Having said all that, all is not so bad. I say transliterate to Latin and make smartfonts. It is a proven success.
>
> How can I search a group of documents, one written in Devanagari and
> another in Sinhala and another in Tamil and another in Oriya, for a
> given string if they all use the same encoding, and the only way to tell
> which is which is to see them rendered in a particular font? That has
> been tried before. It is a proven failure.
> Agreed, here. Also some uses input text strings in multiple languages and scripts??
>> I do not understand what you meant by "jury-rigged to accommodate visual display order". Did you mean using unexpected shapes for>> Latin codes? If you meant that, how do you justify earlier versions of Unicode standard giving specific explanation about codepoints that >> they do not represent shapes and Fraktur and Gaelic could very well use Latin as their underlying codes?
>
> Latin (Antiqua) and Fraktur and Gaelic letters are, intrinsically, the
> same letter. That is not true for Devanagari and Sinhala and Tamil and
> Oriya letters.
>
>> I think the ability to use text in the computer in the way you expect text to behave in it is very important. For instance, if you have shape representations mapped to code clusters, scanned text could be more accurately digitized.
>
> Go ahead and design your own encoding, then. It may be of use for niche
> applications that care only about display and nothing else.A personal view: First, I think it's worthwhile to work within Unicode, which is where the mainstream of work is being done, even for the disatisfied. As for smart fonts, I'm not sure I understand these correctly. I personally think it would be interesting to see language-sensitive fonts that treate Arabic and Persian numbers as different shapes for the same numbers; this could be important in preventing security breaches. But this is not how it's done. That said, I don't think everything can be handled by smart fonts.And in fact, had we relied on smart fonts to display one single set of numbers as either Persian or Arabic, we would have had to wait for the apps/smart fonts to come along. (Perhaps then someone like yourself would haved created all the needed fonts.)Then there are proposals for new unicode characters, for example the Urdu jazm (syllable coda, termination of a syllable); to display these currently correctly relies on language-specific styling and some fonts can do it and some cannot; perhaps I should have favored the encoding of a new character here but for security reasons I decided the current characters we had should be sufficient.
As for transliterating, then making a smart font, again I am unsure I understand smart fonts here, can all languages can be transliterated character-for-character? Arabic, for example, has two aliphs, dagger aliph and standard aliph. The phonetic transliteration into Latin, for both, is identical.
Nevertheless, transliterations, where the user selects from several characters to get the right character, can be very helpful and save tons of downloading of whole character sets in the case of languages such as Chinese that have a huge inventory (the code charts even do not download onto my mini at all). Character pickers can work this way, letting the user input a Romanized character (which is just fine for users who know the Latin alphabet and many do; I am, of course, unfamiliar enough with smart fonts as to be unsure as to whether these would handle the dagger and standard aliph properly. Would they handle this properly?).
(Sorry that I always use Arabic as my example; I know Arabic somewhat; and a few words/phrases of Persian; I have zero familiarity with most other Asian languages. And P.S. I don't make money from participating in the list though it may help me at some point to "pad my resume" -- that is to add a comment that I participate in lists; in any case, I do think it is great IMO that the web has supporters among/technical input from both individuals and commercial users as well as places like the Library of Congress.)
Best,
--C. E. Whiteheadcewcathar_at_hotmail.clm
>
> --
> Doug Ewell | Thornton, Colorado, USA | RFC 5645, 4645, UTN #14
> www.ewellic.org | www.facebook.com/doug.ewell | @DougEwell 
>

Received on Tue Nov 15 2011 - 13:57:26 CST

This archive was generated by hypermail 2.2.0 : Tue Nov 15 2011 - 13:57:35 CST