The Unicode Consortium Discussion Forum (CLOSED)

The Unicode Consortium Discussion Forum (CLOSED)

The forum has been closed, but prior postings are accessible for reading.
 Forum Home  Unicode Home Page Code Charts Technical Reports FAQ Pages 
It is currently Sun Dec 09, 2018 2:18 pm

All times are UTC - 6 hours [ DST ]

Post new topic Reply to topic  [ 2 posts ] 
Author Message
 Post subject: Stability of UBA and the ELM
PostPosted: Sun Oct 23, 2011 4:23 pm 
Forum Admin

Joined: Fri Dec 04, 2009 9:13 pm
Posts: 32
The bidi stability clause, in retrospect, was badly written. It doesn't prevent breaking changes to the BIDI algorithm, but does complicate extensions. I think it was a fallout from the bad experience we had with the GC, where we decided not to add new property values because people had switch statements based on the old ones. Because the GC logically had a hierarchy (Symbol, Punctuation, Letter,...) but didn't actually incorporate that structure, moving characters from one punctuation subtype to another would cause them to be not recognized as punctuation by old implementations.

It is quite a different matter when introducing a new type of character, and only applying it to a new character. Old BIDI implementations wouldn't recognize the new character, but if they were updated to the new version of Unicode -- with an attendant, minor, code change -- they could work with the new character. Of course, like other cases with the introduction of a new character, it would be some time before the majority of major implementations supported it, and it could be generally used.

We can, however, respect the stability clause in 2 alternative ways. One is to have the BIDI algorithm depend on the character code, not the BidiClass. The other is to define a new BidiClassExtension that has the new code. There are pluses and minuses to each approach.

 Post subject: Re: Stability of UBA and the LDM proposal
PostPosted: Mon Oct 24, 2011 4:21 pm 
Unicode Guru

Joined: Tue Dec 01, 2009 2:49 pm
Posts: 182
The way I have always parsed the "spirit" of the stability guarantees for the bidi algorithm is that it was stable - except as to the additions of new letters. The policies effectively guaranteed that implementations could be written in a way that only required updating the property tables to account for new characters (leaving aside the occasional 'bug fix").

My argument is that, given the universal requirement to support this particular bidi algorithm for the sake of predictability and interoperability, this was a beneficial state of affairs. From the outset, the tradeoff was made that, for example the character "/" could be supported either as the date separator or as the math operator, but not both. One or the other usage always would need overrides.

In addition, the default character properties were designed such that the addition of characters would cause minimal disruptions. Any strong character would be assigned in an area matching its directional property with the earlier default property value. While the same was not true for punctuation and numeric characters, the implied hope was that at least the edge cases that would show up their different behaviors were infrequent.

As a result, you could expect any existing implementation to show the same ordering for the vast majority of texts containing characters beyond the ones that it was explicitly updated for.

Giving a new character a totally novel bidi class (or behavior) destroys this interoperability. The good majority of texts containing this new character would be ordered differently by a downversion implementation. That's especially of concern, because the new character would have been added, deliberately, to achieve a specific effect.

This approach (as well as it's proposed alternate) would do away with the implicit guarantees of interoperability that are inherent in not only the particular stability policies, but the larger attempts to make the UBA cross-version interorperable as much as possible.

Further, no matter which route was chosen, this change would destroy reliance on a particular maintenance strategy that had been implicitly blessed by the Consortium (change property tables only). While the changes to each implementation simply to account for the LDM might be small, the problem is that there exist too many implementations, and there is often no good way to know which implementation a text is viewed by.

For these reasons, I argue, that any such disruptive change, where necessary, needs an explicit version of the bidi algorithm (as opposed to just a new version of the Standard). It would be a "new" UBA, UBA-2.0 or whatever you'd like to call it. You'd probably best off with collecting additional changes. such as the ones proposed by Kent Karlsson. In addition, the use of this "Super UBA" needs to be embedded in certain Higher Level Protocols (such as HTML5.x) so that users have a chance of predicting which environment supports the new features.

In the particular case of the LDM, I'm not convinced that it's design is final enough to spend time on it. There are too many alternative suggestions that merit investigation before putting this up for a decision. I would expect the UTC tnot to decide on any of these at this round but direct someone to arrive at a consolidated proposal for more focused public review.

Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 2 posts ] 

All times are UTC - 6 hours [ DST ]

Who is online

Users browsing this forum: No registered users and 1 guest

Quick-mod tools:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group
Template made by