Re: Public Review Issues Updated

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Thu Apr 29 2004 - 18:10:32 EDT

  • Next message: John Hudson: "Re: New contribution"

    From: "Peter Constable" <petercon@microsoft.com>
    > I wrote:
    > > If you introduce c-stroke and C-stroke, they should come immediately
    > > as a pair
    >
    > It does not necessarily have to happen that way. There are plenty of
    > cases of lowercase letters without uppercase counterparts in the UCS.
    > If had had good evidence for C-stroke, I would have proposed it at
    > the same time, but I did not. The C-stroke can be proposed after the
    > c-stroke has already been added to the standard, and since UTC has
    > already accepted the c-stroke, that is what would necessarily have to
    > happen.

    OK, so we agree about the fact that unifying currency symbols with
    letters is ill. A good justification also for the separate encoding of the
    CEDI currency sign.

    So now we are left with orthographic/phonetic letters. c-stroke is one that
    was covered in your searches. But now that we know that capital C-stroke
    is also used, can Unicode be updated later to add a case mapping for
    c-stroke, if C-stroke is added later?

    Aren't case mapping normative properties, thus subject to the stability policy?
    What would happen if for example these characters needed to be used for
    book or chapter titles (uppercased or with small capitals)? How could most
    letters be converted from lower to upper case, leaving the c-stroke alone?
    Same problem if this happens for people names or toponyms (uppercase letters
    are often needed, sometimes required for example to note postal addresses,
    or to fill in some administrative forms.)

    If C-stroek is left unencoded, people will need to use hacks like font size
    adjustments for this letter only, or to use simple C + a combining overlay
    to print the missing character properly. I think that this may already happen
    today (because users are using the existing standard with the characters
    it currently supports, even if decomposed characters with overlays are
    a hack:

    This already happens for African languages with missing Latin letters
    with horizontal bars, and various hacks are used to render this overlay, more
    or less successfully. So to solve the problem, they create specialized fonts
    that will already render correctly the decomposed combining sequence.
    (I have found such occurences within some existing fonts developed and
    published by SIL.org... demonstrating that the missing uppercase letters
    were actually needed: SIL is clearly interested in supporting actual languages
    in existing communities, and to ease the transcription of sacred texts
    (notably the Biblic and Quranic texts) in the missing languages with their
    local native orthograph (which was created to cover the language with
    its unique phonetic and grammatical or semantic rules).

    Notably in Africa, South Asia, Mexico, and in native South-American
    communities, where the sacred texts are needed and awaited since long
    to help support education.

    Just with the Latin script, I have seen numerous examples were the
    existence of only the lowercase letters in Unicode caused problems,
    because the uppercase version is also needed for the modern usage
    which integrates the presentation forms used by other Latin-written
    European languages also used in the same regions. So usage of
    uppercase, titlecase and lowercase has been accepted as a de-facto
    standard for the Latin script, which is now used to write a lot of
    languages with various origins or linguistic families and subgroups.
    More languages will be, sooner or later, romanized (in fact some ISO
    standards require a romanization of the native script, for exchange
    of information on toponyms, trademarks, people names... or simply
    to get a presence on the web).

    It's quite natural to think that all the missing uppercase versions of
    letters currently encoded only in lowercase will be needed in a near
    future. Today this can be done with ease for non-overlay diacritics,
    but letters with overlay diacritics will still be a problem to allow
    performing correct case conversions. Today this is possible by
    using the combining overlays but the presence of only one case
    complicates things. So people will recommend instead to use the
    decomposed letters as long as there will not be a case pair. This
    will simply ease string handling, and this will still work with a very
    simple glyph substitution rule in fonts such as those prepared by SIL
    for various linguistic communities.

    If we want to avoid complexity and allow using uppercase, smallcaps
    and titlecase rendering styles for languages using those letters, it seems
    reasonnable to expect that the communities using the Latin letters with
    overlays will also recognize easily the uppercase version.

    I accept the fact that there is no urgent need to encode an uppercase
    version which has still not be found, but the standard should be prudent
    and make an explicit reserve face to its stability policy, so that the
    normative case mappings will be allowed to change to include possible a
    future uppercase version, if it is demonstrated that it exists (I bet that
    users of lowercase letters will rapidly adopt the additional character
    because it's quite natural to give more freedom in presentation style).

    After all, the original Latin script only had uppercase letters, lowercase
    was introduced as a matter of style to ease readability of text, by
    borrowing some narrower and cursive forms from the handwritten script.
    I see the separation of lower and upper case versions of letters as an
    invention of printers or typographers, to allow putting more text on the
    same page (notably when paper was so expensive and rare, before
    its manufacture was simplified by chemical processing of wood).

    Today, most written text is printed, and the current technologies will
    allow more freedom in presentation styles, including the case variations.



    This archive was generated by hypermail 2.1.5 : Thu Apr 29 2004 - 18:49:04 EDT