RE: Benefits of Unicode

From: Peter_Constable@sil.org
Date: Mon Jan 29 2001 - 11:42:20 EST

Next message: Marco Cimarosti: "OT: apologizing (was RE: Chemistry in chinesse (Only in chinesse? ))"
Previous message: Marco Cimarosti: "RE: Benefits of Unicode"
Maybe in reply to: Tex Texin: "Benefits of Unicode"
Next in thread: Jonathan Rosenne: "RE: Benefits of Unicode"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On 01/29/2001 09:50:48 AM "Richard, Francois M" wrote:

>That would be my next question: Although I might have an HTML file encoded
>in iso-8859-1, the parser has to interpret following the markup AND using
>the Unicode repertoire (CCS).
>Does this flexibility is taken into consideration anywhere into Unicode?

The CCS for HTML can be the Unicode repertoire, which means that an HTML
document is in principle capable of containing any character in that
repertoire, and that the characters must be interpreted per the
requirements of Unicode. But the HTML spec can still allow for the encoding
to be something other than a Unicode-sanctioned encoding form, and the
particular encoding form, e.g. iso-8859-1, might be capable of supporting
only a subset of the Unicode repertoire. That is in no way a problem for
Unicode: all this non-Unicode stuff is, in the view of Unicode, a higher
level protocol. Data has to be processed per the specifications of
iso-8859-1 and mapped into the Unicode CCS. Once this level of
interpretation is done, then Unicode's specification and conformance
requirements apply. In other words, by choosing Unicode as the CCS, this
means that once you've determined that a byte sequence in the HTML file
represents the Unicode character U+12A2 (say), then you can't treat it as
though it were (say) an Arabic letter beh.

- Peter

---------------------------------------------------------------------------
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <peter_constable@sil.org>

Next message: Marco Cimarosti: "OT: apologizing (was RE: Chemistry in chinesse (Only in chinesse? ))"
Previous message: Marco Cimarosti: "RE: Benefits of Unicode"
Maybe in reply to: Tex Texin: "Benefits of Unicode"
Next in thread: Jonathan Rosenne: "RE: Benefits of Unicode"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:18 EDT