RE: Decomposed vs Composed accented characters

From: Kent Karlsson (kent.karlsson14@comhem.se)
Date: Wed Apr 12 2006 - 14:43:17 CST

Next message: Rick McGowan: "Unicode.org server outage"

Previous message: Kenneth Whistler: "Re: Unicode 5.0 Character Count?"
In reply to: Keutgen, Walter: "RE: Decomposed vs Composed accented characters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Walter Keutgen wrote:

> reading the *draft* standard of which you kindly provided the

ISO has a policy of only making a few (IT) standards freely available.
For the others, only the drafts (up to a point) are freely available.

> link, I can only conclude that Otto's reading is correct.

No, you've been fallen for the same misleading explanation as Otto.
Please read Ken's excellent and much more detailed response than mine.

> See the following quote (copied and pasted):
...
> diacritical MARKS, which are 'no characters' and have
> an encoded representation that may never stand alone, but
> must be followed by a base letter or the space, as
> restricted in the 'repertoire'.
>
> Table 4 defines the character REPERTOIRE

Indeed.

> i.e. the valid combinations.

...of lead byte and tail byte (as well as valid single byte codes).

> But there are contradictions, at least from the usability
> point of view:
>
> In Annex D:
>
> "NOTE 19
> "For spelling the Welsh language correctly, some more letters
...

I'm not sure why they did it that way, but the Welsh letters can be seen
as a "blessed optional extension".

> In 7 bit encoding, escape sequences are necessary, which will
> separate the 'lead byte' from the 'base letter'.
> In my opinion this is a strange property for a precomposed encoding.

No, but using the 7-bit variety *is* strange and cumbersome, and
as far as I know never used.

> The letter sequence 'lead', as in 'lead byte', does not appear in the
text.

No, but that does not change the encoding technically in any way.

> "4.15 repertoire: A specified set of characters that are
> represented by one or more bit combinations of a coded
> "character set.
>
> Why 'or more bit combinations'?

Usually a repertoire has more than one element...

However, reading it more closely to the way you are reading it:
It is not uncommon to have the same character represented
in several different ways (bitwise). As long as one does not
mix the 7- and 8-bit byte based versions of 6937, it does
not apply to 6937.

> The standards begins with a clear, not clumsy, combining

It is highly misleading, and therefore clumsy.

...
> sub-application. Anyway the standard seems however not to be
> released.

Yes it is, published in 2001:
http://www.iso.org/iso/en/CatalogueDetailPage.CatalogueDetail?CSNUMBER=3
1393&ICS1=35&ICS2=40&ICS3=
It is very unlikely to be revised (just reconfirmed), since all ISO
efforts
on character standardisation is focused on ISO/IEC 10646.

> 'Annex C' is rather your opinion, but is marked 'informative'.

Annex C is just a summary of table 4, and as the summary may be
faulty it is just informative. But table 4 is normative. (Besides, I
never
mentioned Annex C in my earlier posts on this thread.)

/kent k

Next message: Rick McGowan: "Unicode.org server outage"
Previous message: Kenneth Whistler: "Re: Unicode 5.0 Character Count?"
In reply to: Keutgen, Walter: "RE: Decomposed vs Composed accented characters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Apr 12 2006 - 14:47:43 CST