Re: Accessing alternate glyphs from plain text

From: William_J_G Overington (wjgo_10009@btinternet.com)
Date: Sat Aug 21 2010 - 02:10:12 CDT

Next message: Uriah Eisenstein: "Re: Are Unihan variant relations expected to be symmetrical?"

Previous message: Peter Constable: "looking for Korean fonts that include U+302E, U+302F"
Maybe in reply to: Doug Ewell: "Re: Accessing alternate glyphs from plain text"
Next in thread: John H. Jenkins: "Re: Accessing alternate glyphs from plain text (from Re: Draft Proposal to add Variation Sequences for Latin and Cyrillic letters)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On Wednesday 11 August 2010, Doug Ewell <doug@ewellic.org> wrote:

> Maybe (though I don't personally believe so) the concept of "plain text" has become so passé that William's variation selectors for swash e's, and additional ligatures, and weather reporting codes, and Portable Interpretable Object Code may one day be considered "within scope" for Unicode.

Variation selector pairs to access alternate glyphs, additional ligatures, localizable sentences and a portable interpretable object code are not in the same categories.

The matter of ligatures is distinctly different from the other items.

The problem with ligatures as encoded in regular Unicode is one of needing an advanced format font and an application that is aware of advanced format fonts. Thus the golden ligatures collection of Private Use Area code points for ligatures, started in 2002, is still of use for producing hardcopy printouts and for making graphics files for people who do not have access to a desktop publishing program that can use an advanced format font. Hopefully, all desktop publishing programs will one day have the capability to handle ligatures in the regular Unicode manner. The golden ligatures collection is a solution that can be useful until that time.

The concept of "plain text" becoming passé is not a necessary condition for encoding in Unicode character plus variation selector pairs to access alternate glyphs, for encoding localizable sentences, for encoding a portable interpretable object code and for encoding vector graphics commands. They would be encoded in the same manner as if they were plain text, not necessarily because they are regarded as plain text.

They could easily become encoded in regular Unicode if there is a consensus that that is a desirable thing to happen.

If such a consensus is formed, there is no need for what is regarded as plain text being changed. What is encoded in Unicode and what is regarded as being plain text need not be the same.

Unfortunately there is the situation that the present policy appears to be that encoding cannot take place proactively. A policy of proactive encoding need not lead to a free-for-all of encoding as encoding would only be done after debate and the formation of a consensus. A policy of proactive encoding would however sweep away the present requirement of widespread existing usage needing to be demonstrated as a necessary condition for encoding. Such a condition might not be unreasonable where encoding is from letterpress printed books or from stone carving from long ago: however, where the condition is required for modern all-electronic communication, then, in my opinion, the condition is an unreasonable shackle on progress and innovation.

On the specific, in this thread, matter of the encoding of character plus variation selector pairs to access alternate glyphs. That encoding would not need the allocation of any new code points. It would need the allocation of character plus variation selector pairs. Those character plus variation selector pairs would be unlikely to have any other uses if they are not encoded. There could be a practice that only character plus variation selector pairs using variation selector 5 onward were used for accessing alternate glyphs, thus leaving four character plus variation selector pairs available for other encoding.

What I find a problem at present is this. If some character plus variation selector pairs for accessing alternate glyphs were encoded into regular Unicode, it seems to me (am I correct in this?) that they should be usable with existing advanced-font-aware application programs immediately, it would just be a matter of one or more fonts using them to become available. Yet in order to get them encoded, it appears that many texts using a Private Use Area encoding would need to be produced, producing problems for web archiving and search engines that a regular Unicode encoding would not produce. All of this whilst the texts produced using a Private Use Area encoding produced problems when displayed using fonts that did not support the alternate glyphs, whereas using proper character plus variation selector pairs would not produce those problems.

I recognize that there may be good reasons of which I am unaware at the time of writing this text for Unicode and ISO not providing facilities for proactive encoding, yet I wonder if it would be a good idea to review the policy now in 2010, in case it is just a matter of policies made long ago still being applied when they are no longer desirable.

Now certainly, if the policy were changed so that proactive encoding is possible when a consensus can be achieved, that does not mean that all, or indeed any, of my own ideas, currently encoded into the Private Use Area, would necessarily be encoded into regular Unicode.

Consider please the matter of emoji. If more emoji are to be encoded, why is it necessary for them first to be used in the Private Use Area, possibly with several different encodings, before being encoded into Unicode and ISO 10646? I say that it would be better to allow ideas for new emoji to be submitted proactively, with a view to encoding some each year. That would encourage user interest and provide for product upgrading.

Returning to the topic of this thread, it seems to me that it would be good for there to be proactive encoding into Unicode of some character plus variation selector pairs to access alternate glyphs. As far as I know, it would do no harm and would be fun.

Regarding the policy that prevents proactive encoding at the present time. Is that policy written down anywhere in a formal document? If so, is that text available publicly and is there a procedure whereby a review concerning the possibility of changing that policy can be requested please?

William Overington

21 August 2010

Next message: Uriah Eisenstein: "Re: Are Unihan variant relations expected to be symmetrical?"
Previous message: Peter Constable: "looking for Korean fonts that include U+302E, U+302F"
Maybe in reply to: Doug Ewell: "Re: Accessing alternate glyphs from plain text"
Next in thread: John H. Jenkins: "Re: Accessing alternate glyphs from plain text (from Re: Draft Proposal to add Variation Sequences for Latin and Cyrillic letters)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sat Aug 21 2010 - 02:13:08 CDT