The Unicode Consortium Discussion Forum

The Unicode Consortium Discussion Forum

 Forum Home  Unicode Home Page Code Charts Technical Reports FAQ Pages 
 
It is currently Mon Jul 28, 2014 9:27 am

All times are UTC - 6 hours [ DST ]




Post new topic Reply to topic  [ 6 posts ] 
Author Message
 Post subject: Variation selectors
PostPosted: Thu Aug 05, 2010 1:07 pm 
Offline
Unicode Guru

Joined: Tue Dec 01, 2009 2:49 pm
Posts: 186
This thread is for the discussion of issues surrounding variation selectors.

Their definition, purpose, use, etc. Specific proposals for new variation selectors should be discussed in the Character Encoding forum.

The first question: when does one "need" variation selectors?

In Unicode, a variation selector is used after a base character to form a variation sequence.

Variation sequences are, by definition, restricted to "narrow down" the selection of possible glyphs for a character. In other words, if a character can normally depicted with a range of glyphs, like looped or hooked lower case "g" for example, a variation sequence, if defined, could select either the hooked, or the looped form of the "g". To be able to do that would take two variation sequences.

Having a variation sequence that means "don't care which glyph" is synonymous with using the base character (i.e. not using a variation sequence). Therefore, there is never a need to add such tautological sequence.

(Just so we don't get more confusion, I've used "g" as an example because most people can easily visualize the glyph variants - I'm neither suggesting nor commenting on the question whether adding such sequences to that particular character is appropriate).

Whether you "need" a variation sequence to select *one* of the forms for a character depends on whether there is a language or scholarly or scientific notation where you *cannot* substitute the other shape without distorting the text.

Where feasible, one encodes a separate character that is defined to have a restricted glyph range. Where it's not clear that the need for a differentiation is absolute (like certain forms of mathematical operators), Unicode has defined variation sequences for some of the alternate forms.


Top
 Profile  
 
 Post subject: Re: When are variation selectors not appropriate?
PostPosted: Thu Aug 05, 2010 1:10 pm 
Offline
Unicode Guru

Joined: Tue Dec 01, 2009 2:49 pm
Posts: 186
Applications not supporting variation sequences should act as if the variation selector was not present.

That normally applies to all text processes like searching, sorting, parsing, etc. That means, it's inappropriate to define a variation selector as the sole means to distinguish alternate readings of a text.

In the case of the mathematical operators, it was well understood that the meaning of the alternate form was mathematically identical - what could not be established was whether some notational conventions demanded one form over the other in some contexts. To be safe, the variation sequences allow access to those forms without creating duplicate character codes (for the same meaning).

In other words, variation selectors are inappropriate if two different shapes of a character carry very distinct meaning. Therefore, for IPA, a character was encoded each for the hooked "g" (at U+0261) and the "a" that doesn't have a handle (at U+0251).


Top
 Profile  
 
 Post subject: Re: Are all variations selectable?
PostPosted: Thu Aug 05, 2010 1:18 pm 
Offline
Unicode Guru

Joined: Tue Dec 01, 2009 2:49 pm
Posts: 186
The existing situation in Unicode sometimes requires that a font follows particular conventions to be useful.

For example, any font for IPA must encode the a with a handle at 0061, and not a bowl a. Otherwise, it is impossible to express the distinction to the IPA character with the bowl a. Same for the "g".

Similar things apply to mathematical fonts and Greek characters, where some forms of beta, theta, phi, etc. have been encoded for mathematical purposes, so the font must supply the "other" shape for the regular character.

The problem with this approach tends to be that the choice of which of the forms to use for the regular character depends on broad stylistic themes for a font. Serifed fonts may use one form, sans-serifed the other, on a systematic basis (this is a bit of an oversimplification).

This would make it impossible to use certain stylistic types of font for math or IPA. The only reason that has not been a huge problem is that one usually uses a serifed font as "regular" font for math variables, for example, and that rends to have the correct Greek forms.


Top
 Profile  
 
 Post subject: Re: what variations should never be selectable?
PostPosted: Fri Aug 06, 2010 12:39 pm 
Offline
Unicode Guru

Joined: Tue Dec 01, 2009 2:49 pm
Posts: 186
By design, variation selectors can only be used as part of a standardized variation sequence (leaving aside for the moment the issue of ideographic variation sequences, for which different rules apply).

In order to standardize a variation sequence, the variant glyph at a minimum needs to be identified and described. It should also be applicable generically, that is not restricted to a single font, such as the many stylistic variations of the ampersand only found in Poetica Ampersand.

A standardized variation sequence, such as < 222A FE00> associates a sequence with a description, such as "UNION with serifs". Here, "with serifs" indicates that the presence of serifs distinguishes the glyph variant from the ordinary glyph (which does not have serifs). In this case, a mathematical operator, the form without serifs would be predominant. There are other cases, where glyph variants occur more equally - in those cases, it would be problematic to assign only one of them a variation sequence, as the other one isn't necessarily a "default".

The appearance of the variant glyph is not as tightly restricted as the design of, for example, a logo. It still can vary in all aspects, except that it is expected to retain its distinguishing characteristic - and it should remain a recognizable glyph for the character.


Top
 Profile  
 
 Post subject: What about positional forms?
PostPosted: Sat Aug 07, 2010 10:04 pm 
Offline
Unicode Guru

Joined: Tue Dec 01, 2009 2:49 pm
Posts: 186
Some characters have positional forms. Unlike "random" stylistic variations, these are standard forms for these characters, in the sense that a reader can look at the shape and say "this is the final form of character xxx". Does that mean that they should be encoded using variation sequences?

In the Arabic and Mongolian scripts, almost all characters have positional forms, meaning their glyph shape depends on whether characters or spaces are adjacent to them, on which side, and to some extent on the type of the adjacent character (this is a very simple minded explanation for the purposes of this post only).

In Greek, the small sigma has a special form, which is used at the end of words.

Should variation sequences be defined to select these forms? In Arabic, the use of positional forms follows well-established rules. Occasional exceptions are handled with two special characters ZWJ and ZWNJ (zero width (non) joiner). So, variation sequences are not needed.

In Mongolian, the system and its exceptions are more complex, so special variation selectors just for Mongolian were added, but in most text, positional shape selection is still automatic.

In modern Greek, only the sigma survives, and it had been given an explicit character code in early Greek character sets, so Unicode continued this practice.

In the Latin script, the contrast between "long s" and regular (round) "s" is in some sense positional, but the rules are not easy to automate, and even then exceptions would apply. Therefore, again, an explicit character was encoded. (The long s is mandatory in the Fraktur type style, but except for historical effect, not really used in any other type styles - and not even when one is showing a document originally typeset in Fraktur but now using a modern typeface).

But what about other characters? In calligraphic or "script" type faces, the shapes of many letters would ideally change based on context (to connect correctly - simple type faces "cheat" by using an "average" connecting stroke). If the rules are regular, they can be automated, and it would not be necessary to encode variation selectors. Fonts would need to be able to expose the positional forms, and layout engines would apply the rules to select the proper forms. Exceptional cases could be handled with ZWJ and ZWNJ - that would be the least surprising way of doing it.

If a text stream containing ZWJ and ZWNJ is encountered on a system lacking the special font or special layout engine, they would be ignored.


Top
 Profile  
 
 Post subject: Variation selectors - the short take
PostPosted: Wed Aug 25, 2010 6:27 pm 
Offline
Unicode Guru

Joined: Tue Dec 01, 2009 2:49 pm
Posts: 186
This is a shorter take on the stuff in the first post for this topic.

Q.: In what situations does Unicode define variation sequences?

A: Variation selectors are intended as *exceptional* mechanism to deal with certain difficult edge-cases where the character vs. glyph question is undecidable. To qualify as a standardized variant an entity must clearly be the *same* character, in most cases. That means, in most contexts, substituting the base character is not only harmless to the meaning of the text, but ideally not even noticeable by many readers.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 6 posts ] 

All times are UTC - 6 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 2 guests


Quick-mod tools:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group
Template made by DEVPPL.com