The Unicode Consortium Discussion Forum

The Unicode Consortium Discussion Forum

 Forum Home  Unicode Home Page Code Charts Technical Reports FAQ Pages 
 
It is currently Fri Apr 18, 2014 1:19 pm

All times are UTC - 6 hours [ DST ]




Post new topic Reply to topic  [ 4 posts ] 
Author Message
 Post subject: Unicode model vs code set independent model
PostPosted: Sun Sep 23, 2012 7:32 am 
Offline

Joined: Sat Aug 06, 2011 9:02 am
Posts: 43
In Chapter 2 General Structure one can read the following paragraph under Semantics :

Quote:
The Unicode Standard, by supplying a universal repertoire associated with well-defined character semantics, does not require the code set independent model of internationalization and text handling. That model abstracts away string handling as manipulation of byte streams of unknown semantics to protect implementations from the details of hundreds of different character encodings and selectively late-binds locale-specific character properties to characters

The expression "That model" above seems to refer to the code set independent model, but it should refer to the Unicode model, otherwise the sentence doesn't make sense. Am I right ?


Top
 Profile  
 
 Post subject: Re: Unicode model vs code set independent model
PostPosted: Sun Sep 23, 2012 2:08 pm 
Offline

Joined: Mon Feb 01, 2010 6:18 pm
Posts: 77
Belloc wrote:
In Chapter 2 General Structure one can read the following paragraph under Semantics :

Quote:
The Unicode Standard, by supplying a universal repertoire associated with well-defined character semantics, does not require the code set independent model of internationalization and text handling. That model abstracts away string handling as manipulation of byte streams of unknown semantics to protect implementations from the details of hundreds of different character encodings and selectively late-binds locale-specific character properties to characters


The expression "That model" above seems to refer to the code set independent model, but it should refer to the Unicode model, otherwise the sentence doesn't make sense. Am I right ?


Nope. Unicode is able to treat character strings as character strings at a very low level, and does not have to abstract out character semantics in order to manipulate strings. Code page based character standards cannot do string manipulation that take into account those semantics because a given byte stream can have any number of semantics, so you have an added layer of architecture that has to interpret the character string once you've done simple string manipulation. Because those string manipulations can be done while retaining character semantics, you don't have to re-interpret them and go back to perform normalization on a Unicode character stream.


Top
 Profile  
 
 Post subject: Re: Unicode model vs code set independent model
PostPosted: Sun Sep 23, 2012 4:58 pm 
Offline

Joined: Sat Aug 06, 2011 9:02 am
Posts: 43
With the single exception of your first word, I entirely agree with what you said. I just don't see the equivalence between the quoted paragraph and your explanation. Perhaps because English is not my native tongue. Thanks anyway.


Top
 Profile  
 
 Post subject: Re: Unicode model vs code set independent model
PostPosted: Sun Sep 23, 2012 5:40 pm 
Offline
Unicode Guru

Joined: Tue Dec 01, 2009 2:49 pm
Posts: 185
Belloc wrote:
With the single exception of your first word, I entirely agree with what you said. I just don't see the equivalence between the quoted paragraph and your explanation. Perhaps because English is not my native tongue.


As far as I can figure out what you are arguing, you essentially claim that the words "that model" are an ambiguous, or incorrect reference to "the code set independent model". Well, since there isn't any other "model" in the preceding sentence, the reference is rather specific and clear.

The summary of the what the code set independent model attempts to do may be a bit terse, but I see no actual discrepancy with the expanded version of the explanation provided by van. Unicode certainly does not provide "late-binding" of locale-specific character properties, because most character properties that Unicode deals in are locale-independent and therefore 'early-bound' to use that term here.

String handling in Unicode is also not done on "byte streams of unknown semantics" - any well-formed stream of code units always resolves to known Unicode characters with known semantics under the Unicode scheme of string handling.

Leaves, as the only point of confusion the phrase: "to protect implementations from the details of hundreds of different character encodings". Under the code set independent model, applications are directly exposed to different character encodings - the data they process are always represented in the original encoding. Hence the need to "protect" by attempting to let implementations process these byte streams blindly up to the point where locale-specific properties are bound to characters.

A Unicode-based implementation may have an outer layer that accepts other encodings, but only in terms of providing a mapping to and from. The core of the implementation always works in Unicode, hence there's no need for the Unicode scheme of string handling to "protect" the implementation. Instead, implementations can fully rely on the identity and properties of the Unicode characters they process at any point in the implementation.

There are some operations, such as special case transformations or sorting that do need locale-dependent processing (and therefore properties) and Unicode doesn't provide any obstacle to implementing those kinds of operations. The difference is that it doesn't force the locale dependency onto every single operation, as the code set independent model would do.

FYI, there are a number of contributors on the Unicode Editorial Committee whose native language is not English. They usually speak up if some draft text is worded in a way that might be unnecessarily difficult to understand by non-natives.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 4 posts ] 

All times are UTC - 6 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 2 guests


Quick-mod tools:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
cron
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group
Template made by DEVPPL.com