Re: Irish dotless I (was: Languages with letters that always take diacriticals

From: Pavel Adamek (pavel.adamek@ima.cz)
Date: Fri Mar 19 2004 - 11:47:19 EST

  • Next message: Philippe Verdy: "Re: What's the BMP being saved for?"

    > > COMBINING C BEFORE
    > > <H><COMBINING C BEFORE> = CH
    >
    > Shhh. It's not April 1 yet.

    Of course I do not want to add this character to Unicode,
    I was only thinking about possibilities.
    The document
     "An operational model for characters and glyphs"
    says:
    -----
    Even within the content domain,
    the nature of the character coding employed
    for textual data affects the type or types of
    processing to be performed on the data; no
    single coding can optimize more than a few
    such potential processes.
    ------

    From the viewpoint of sorting,
    the coding <H><COMBINING C BEFORE>
    would be much better than
    <C><COMBINING H AFTER>.

    Also from the viewpoint of sorting and searching,
    it would be better to have not separate characters
    for LATIN CAPITAL LETTERs and LATIN SMALL LETTERs,
    but case insensitive LATIN LETTERs
    plus several zero-width characters like
    BEGIN OF SENTENCE,
    BEGIN OF PROPPER NAME,
    BEGIN OF GERMAN SUBSTANTIVE,
    BEGIN OF ENGLISH VERSE,
    which would provide context for choosing of a capital or a small glyph.

          P.A.

          P.A.



    This archive was generated by hypermail 2.1.5 : Fri Mar 19 2004 - 12:31:30 EST