Re: Uppercase ß is coming? (U+1E9E)

From: Kenneth Whistler (kenw@sybase.com)
Date: Mon May 07 2007 - 17:06:50 CDT

  • Next message: Michael Everson: "Re: Uppercase is coming? (U+1E9E)"

    > > Adam Twardoch wrote:
    > > ... would make as little sense as encoding the
    > >> uppercase "ß" as "S ZWJ S".

    But of course stating that way distorts the sense of the argument,
    anyway. The counterproposal is to say that given existing
    Unicode conventions, one could simply say that in those minority
    contexts where one wishes to display an <S, S> sequence as
    an uppercase [], use of a ZWJ to maintain a plain text distinction
    and a ligature from a font for presentation could suffice.
    That isn't *encoding* uppercase [] as "S ZWJ S"; it is
    displaying <S, ZWJ, S> with a ligature uppercase [] glyph.

    And John Hudson's argument about this is that using existing
    mechanisms might work better as a practical matter, because
    it has graceful fallback behavior.

    But those advocating *for* uppercase [] don't seem to be
    making practical arguments here, as best I can tell. The
    argumentation is *essentialist* in nature: uppercase [] *is*
    a letter, not a ligature, *therefore* it *must* be encoded
    as a character.

    I've been around the bend enough times to realize there isn't
    much mileage to be gained in trying to argue down
    essentalists, but I would like them to at least consider
    the parallel with folks who have been arguing for years,
    for example, that "ksa" in Devanagari *is* a letter, and therefore
    must be encoded as a character.

    > >> I strongly believe that "SS" is an anachronic, still-in-use but
    > >> slowly-to-vanish poor man’s solution to write the uppercase "ß".

    I'm perfectly willing to accede that writing systems change,
    and the status of elements within them may change diachronically.
    There are plenty of such examples in the Latin script, as we
    all know. And it may well be that is in the middle of such
    a transition. As Asmus noted, its "letterhood" is now officially
    recognized in the German orthography, and as Adam and others
    talking about the nature of Latin as a bicameral script have
    been wont to point out, that means growing pressure for it
    to acquire an uppercase form, whether we like it or not. Certainly
    this echoes the process whereby many lowercase IPA use letters
    have acquired uppercase forms by dint of usage in language
    orthographies.

    But Adam here is talking as if the future course of history
    here is predestined. There apparently is a camp of people
    who think that not only is uppercase [] a letter and
    deserving of encoding as a character, but it will inevitably
    be reckoned as the rightful uppercase mapping of , with
    further attendant changes to formal orthographic rules.

    John Hudson responded:

    > > I suspect, and indeed hope, that you are right. ...[but] having a
    > > single lowercase character with two different uppercase mappings, one
    > > currently standard and enshrined in existing casing rules and
    > > implementations, one that might one day become standard and require
    > > some kind of overriding implementation, seems to me a bit of a
    > > standardisation and software development nightmare.
    > >

    And Asmus replied:

    > The 'nightmare' is not with the characters, but with the potential that
    > officially sanctioned rules might change.

    ... which Adam has as much as said is the future course of history.

    But I don't think Asmus' pooh-poohing the concerns of John about
    the character implementation issue does justice to the real
    issues here.

    The proposal formally suggests that uppercase [] get a lowercase
    mapping to , but that, for stability, not get an uppercase
    mapping to uppercase []. That would be, to the best of my knowledge,
    an unprecedented kind of case mapping in the UCD, and has its
    own stability issue: there will be *years* of carping and rabblerousing
    that will follow on from that decision, as the camp which believes
    that the natural, self-evident, and essential casemapping
    relations should be:

         <--> uppercase []
       ss <--> SS
       
    will attempt to get the UnicodeData case mappings (and implementations
    that follow from that) and case foldings "fixed" to reflect that
    inevitable rightness.

    But any changes in such a direction *are* the kind of software
    development nightmare that John Hudson is warning about.

    I won't bother trying to get them to pledge that they won't ask
    for that, because they may well say so now (as the proposal does),
    but then simply turn around and ask for the changes anyway.

    Asmus went on to say:

    > There's absolutely nothing
    > that can prevent such a change, even if it were not to involve new
    > characters. For example, assume that the solution of using 'SZ' in
    > contrast to 'SS' became official. It would equally invalidate all
    > software and throw confusion even into (fuzzy) search and sorting, with
    > the potential of dragging lower case 'sz' into the fray.

    No doubt that would be the case.

    >
    > That's why the proposers, correctly in my opinion, did not base their
    > proposal on speculation on the direction of potential future reform, but
    > limited themselves to documenting the existing usage, which clearly can
    > be supported and deserves to be supported.

    But I just don't buy that argument. The "existing usage" can
    be supported with existing characters and with properly designed
    fonts, actually. I think this comes back down to the essentialist
    argument again. There is a group of German users and scholars
    who believe that uppercase [] *is* a character, and it is
    *that* which deserves to be supported, apparently.

    I have yet to see cogent technical arguments for what real
    issues are being addressed here, other than the need to *display*
    uppercase [] glyphs on demand. The text processing arguments
    have all been mumbo-jumbo and handwaving so far.

    Furthermore, while the proposers may not have "base[d] their
    proposal on speculation on the direction of potential future
    reform", it is pretty clear from the discussion on this list
    that the decision to encode an uppercase [] is smack in the
    middle of such speculation, and encoding it will be used as
    a lever to make further changes. Hence the (overly) passionate
    opposition, as well as the (overly) passionate support for the
    proposal, in my opinion.

    > I remember writing before somewhere that I think their proposal should
    > be accepted as presented.

    Ah, but it has been awhile since I've seen a single character
    encoding proposal engender this much debate and controversy.

    It may well be accepted as presented, but it is unlikely to
    do so with any clear consensus.

    --Ken



    This archive was generated by hypermail 2.1.5 : Mon May 07 2007 - 17:07:55 CDT