RE: Decomposed vs Composed accented characters

From: Kent Karlsson (kent.karlsson14@comhem.se)
Date: Wed Apr 12 2006 - 14:43:17 CST

  • Next message: Rick McGowan: "Unicode.org server outage"

    Walter Keutgen wrote:

    > reading the *draft* standard of which you kindly provided the

    ISO has a policy of only making a few (IT) standards freely available.
    For the others, only the drafts (up to a point) are freely available.

    > link, I can only conclude that Otto's reading is correct.

    No, you've been fallen for the same misleading explanation as Otto.
    Please read Ken's excellent and much more detailed response than mine.

    > See the following quote (copied and pasted):
    ...
    > diacritical MARKS, which are 'no characters' and have
    > an encoded representation that may never stand alone, but
    > must be followed by a base letter or the space, as
    > restricted in the 'repertoire'.
    >
    > Table 4 defines the character REPERTOIRE

    Indeed.

    > i.e. the valid combinations.

    ...of lead byte and tail byte (as well as valid single byte codes).

    > But there are contradictions, at least from the usability
    > point of view:
    >
    > In Annex D:
    >
    > "NOTE 19
    > "For spelling the Welsh language correctly, some more letters
    ...

    I'm not sure why they did it that way, but the Welsh letters can be seen
    as a "blessed optional extension".

    > In 7 bit encoding, escape sequences are necessary, which will
    > separate the 'lead byte' from the 'base letter'.
    > In my opinion this is a strange property for a precomposed encoding.

    No, but using the 7-bit variety *is* strange and cumbersome, and
    as far as I know never used.

    > The letter sequence 'lead', as in 'lead byte', does not appear in the
    text.

    No, but that does not change the encoding technically in any way.

    > "4.15 repertoire: A specified set of characters that are
    > represented by one or more bit combinations of a coded
    > "character set.
    >
    > Why 'or more bit combinations'?

    Usually a repertoire has more than one element...

    However, reading it more closely to the way you are reading it:
    It is not uncommon to have the same character represented
    in several different ways (bitwise). As long as one does not
    mix the 7- and 8-bit byte based versions of 6937, it does
    not apply to 6937.

    > The standards begins with a clear, not clumsy, combining

    It is highly misleading, and therefore clumsy.

    ...
    > sub-application. Anyway the standard seems however not to be
    > released.

    Yes it is, published in 2001:
    http://www.iso.org/iso/en/CatalogueDetailPage.CatalogueDetail?CSNUMBER=3
    1393&ICS1=35&ICS2=40&ICS3=
    It is very unlikely to be revised (just reconfirmed), since all ISO
    efforts
    on character standardisation is focused on ISO/IEC 10646.

    > 'Annex C' is rather your opinion, but is marked 'informative'.

    Annex C is just a summary of table 4, and as the summary may be
    faulty it is just informative. But table 4 is normative. (Besides, I
    never
    mentioned Annex C in my earlier posts on this thread.)

                    /kent k



    This archive was generated by hypermail 2.1.5 : Wed Apr 12 2006 - 14:47:43 CST