Re: Uppercase is coming? (U+1E9E)

From: Asmus Freytag (
Date: Mon May 07 2007 - 18:35:09 CDT

  • Next message: Kenneth Whistler: "Re: Ranges/blocks ; font lookup by range"

    On 5/7/2007 3:06 PM, Kenneth Whistler wrote:
    >>> Adam Twardoch wrote:
    >>> ... would make as little sense as encoding the
    >>>> uppercase "ß" as "S ZWJ S".
    > But of course stating that way distorts the sense of the argument,
    > anyway. The counterproposal is to say that given existing
    > Unicode conventions, one could simply say that in those minority
    > contexts where one wishes to display an <S, S> sequence as
    > an uppercase [], use of a ZWJ to maintain a plain text distinction
    > and a ligature from a font for presentation could suffice.
    > That isn't *encoding* uppercase [] as "S ZWJ S"; it is
    > displaying <S, ZWJ, S> with a ligature uppercase [] glyph.
    In light of what the discussion has yielded so far, I find this line of
    argument highly
    disingenuous. It has been demonstrated, that in the context of German
    the choice to retain in ALL UPPERCASE is a clear statement that "SS"
    is not
    the desired *text*.

    So, the question can never be whether one wants another glyph for "SS",
    but whether
    one needs another form of .
    > And John Hudson's argument about this is that using existing
    > mechanisms might work better as a practical matter, because
    > it has graceful fallback behavior.
    If fallback behavior was the issue, falling back to a lowercase would
    be more
    > But those advocating *for* uppercase [] don't seem to be
    > making practical arguments here, as best I can tell. The
    > argumentation is *essentialist* in nature: uppercase [] *is*
    > a letter, not a ligature, *therefore* it *must* be encoded
    > as a character.
    The argument is essentially essentialist, but it doesn't proceed the way
    that you
    are trying to summarize here. The argument is that is a letter, and
    that recent
    reform has endorsed that view by giving that letter a less ambiguous
    in the (standard) orthography, where it now is used *in contrast* to
    'ss' to mark
    long vowels.

    Therefore, the argument goes, uppercase is *in essence* a form of , and
    never a form of "SS".

    > I've been around the bend enough times to realize there isn't
    > much mileage to be gained in trying to argue down
    > essentalists, but I would like them to at least consider
    > the parallel with folks who have been arguing for years,
    > for example, that "ksa" in Devanagari *is* a letter, and therefore
    > must be encoded as a character.
    Arguments about ksa in Devanagari appear little helpful in this context,
    neither are
    aspersions thrown at (groups of) people.
    >>>> I strongly believe that "SS" is an anachronic, still-in-use but
    >>>> slowly-to-vanish poor man's solution to write the uppercase "".
    > I'm perfectly willing to accede that writing systems change,
    > and the status of elements within them may change diachronically.
    > There are plenty of such examples in the Latin script, as we
    > all know. And it may well be that is in the middle of such
    > a transition. As Asmus noted, its "letterhood" is now officially
    > recognized in the German orthography, and as Adam and others
    > talking about the nature of Latin as a bicameral script have
    > been wont to point out, that means growing pressure for it
    > to acquire an uppercase form, whether we like it or not. Certainly
    > this echoes the process whereby many lowercase IPA use letters
    > have acquired uppercase forms by dint of usage in language
    > orthographies.
    The fact that there is persistent minority variation in the orthography
    on this issue is in
    itself very telling, because of the fact that the popular view of German
    writers is that
    their orthography unambiguously follows mandated rules. (Contrast this
    to the popular
    view of Americans about their own orthography, and ponder this "thru"
    and "through").
    > But Adam here is talking as if the future course of history
    > here is predestined.
    The "slowly-to-vanish" in the above quote may make it seem that way, but
    I wouldn't
    over-interpret that. *If* you care to make a prediction about the
    future, a continuation of the
    trend seems likely, however, it's taken over 10 years to get the latest
    spelling reform
    digested, and that one happened 95 years after the 1901 reform. In such
    time spans,
    a lot of things can happen, and Adam surely is aware of that as well.
    > There apparently is a camp of people
    > who think that not only is uppercase [] a letter and
    > deserving of encoding as a character, but it will inevitably
    > be reckoned as the rightful uppercase mapping of , with
    > further attendant changes to formal orthographic rules.
    I'm sure there is. You can call them "friends of the uppercase " if
    that makes you feel
    better. I think it's great when letters can have fan-clubs. However, the
    fact that there
    are some people who are not only fascinated by watching slow changes in
    their language
    take place but wish to take the side in favor of a particular direction,
    does not enter
    into the analysis of the minority orthography and its use of in ALL
    UPPERCASE context.
    > John Hudson responded:
    >>> I suspect, and indeed hope, that you are right. ...[but] having a
    >>> single lowercase character with two different uppercase mappings, one
    >>> currently standard and enshrined in existing casing rules and
    >>> implementations, one that might one day become standard and require
    >>> some kind of overriding implementation, seems to me a bit of a
    >>> standardisation and software development nightmare.
    > And Asmus replied:
    >> The 'nightmare' is not with the characters, but with the potential that
    >> officially sanctioned rules might change.
    > ... which Adam has as much as said is the future course of history.
    And which Asmus has said that it's too soon to tell.
    > But I don't think Asmus' pooh-poohing the concerns of John about
    > the character implementation issue does justice to the real
    > issues here.
    > The proposal formally suggests that uppercase [] get a lowercase
    > mapping to , but that, for stability, not get an uppercase
    > mapping to uppercase []. That would be, to the best of my knowledge,
    > an unprecedented kind of case mapping in the UCD,
    The issue itself is not precedented. A formal solution would
    (hypothetically) be to have localization based on the level of
    'orthography' and not on the level of language-country pairs. This
    cannot be a real solution since trying to tag texts correctly and making
    sure software is configured at all times correctly is unrealistic to the
    extreme - the more so, as the ALL UPPERCASE context itself is so
    restricted in German writing.

    Therefore the proposers wisely don't suggest such a thing, but consider
    a certain lack of automatic case conversion in this instance a small
    price to pay - as long as the text, once encoded, is transmitted and
    displayed correctly.

    Given that is used in ALL UPPERCASE context today, even in official
    the existing case mappings tell only a partial story. Mapping to
    itself on uppercasing
    would match what I have called the 'minority' usage. Allowing a true
    uppercase form
    that maps to lowercase leaves that problem completely unaddressable,
    but adds
    no new problems. It is already impossible to round-trip from lower to
    upper case
    forms and back. All the change would do, is to allow (manual) support
    for a character
    that fits somewhat better in an ALL UPPERCASE environment, but that is case
    folded correctly to and not ss.

    For users of the majority orthography, the side-effect of these mappings
    is to allow easy conversion from non-standard to standard texts via
    repeated case mappings. An unintended benefit, but, for the current (not
    future) situation, a plus.
    > and has its
    > own stability issue: there will be *years* of carping and rabblerousing
    > that will follow on from that decision, as the camp which believes
    > that the natural, self-evident, and essential casemapping
    > relations should be:
    > <--> uppercase []
    > ss <--> SS
    > will attempt to get the UnicodeData case mappings (and implementations
    > that follow from that) and case foldings "fixed" to reflect that
    > inevitable rightness.
    Well, it's always easy to rouse a rabble, but to make this change would
    require that the standard orthography be changed, so it's a simple
    matter of directing said rabble in that direction. If they can convince
    their fellow countrymen that it's worth enduring another orthography
    reform, then so much more power to them (hardly likely, *if* one cared
    to predict the future based on recent experience).
    > But any changes in such a direction *are* the kind of software
    > development nightmare that John Hudson is warning about.
    And as I observed, as quoted below, that applies to *any* change. The
    unhappiness with the current crop of spell-checkers, trying to
    faithfully implement the last reform, has reached endemic proportions in
    Germany: ink is spilled by the book on that subject. Unicode can't
    simply say: "no reforms, you've got what you got, and don't throw a fit"
    and be done with it.
    > I won't bother trying to get them to pledge that they won't ask
    > for that, because they may well say so now (as the proposal does),
    > but then simply turn around and ask for the changes anyway.
    Personalizing it in this way is not helpful. If the orthography changes,
    Unicode will be asked to change, and will have to follow. If the
    orthography doesn't change, Unicode may be asked, whether by the authors
    of the current proposal or by unrelated enthusiasts, but in neither case
    does it have to act.
    > Asmus went on to say:
    >> There's absolutely nothing
    >> that can prevent such a change, even if it were not to involve new
    >> characters. For example, assume that the solution of using 'SZ' in
    >> contrast to 'SS' became official. It would equally invalidate all
    >> software and throw confusion even into (fuzzy) search and sorting, with
    >> the potential of dragging lower case 'sz' into the fray.
    > No doubt that would be the case.
    And, as we argued that the distinction between "SS" and something else
    (expressed as in lower case) is what's ultimately desired, it's not
    clear that one can predict that it will be the uppercase . It may well
    be the "SZ", even though, today, that does not seem likely (because it's
    even uglier than the uppercase , even though it's had precedent...).
    >> That's why the proposers, correctly in my opinion, did not base their
    >> proposal on speculation on the direction of potential future reform, but
    >> limited themselves to documenting the existing usage, which clearly can
    >> be supported and deserves to be supported.
    > But I just don't buy that argument. The "existing usage" can
    > be supported with existing characters and with properly designed
    > fonts, actually.
    Not unless you are thinking about fonts that have a variant form of in
    ALL UPPERCASE context. There are fonts that are ALL UPPERCASE, such as
    Augsburger Titling, but asking for a font change for uppercase seems
    somehow not following the precedent set by Unicode's encoding model.

    Had the encoding model been one of using a COMBINING UPPERCASE FORM
    SELECTOR applied to the lower case character, the whole issue would now
    not be discussed, or not in the same way.
    > I think this comes back down to the essentialist
    > argument again. There is a group of German users and scholars
    > who believe that uppercase [] *is* a character, and it is
    > *that* which deserves to be supported, apparently.
    Based on the view that *is not* a form of "ss".

    Once you accept that, and recent changes have made that more compelling,
    then you ask yourself why it is necessary to insist that it cannot be
    represented in distinction to SS in uppercase, and why, if it is a letter,
    those that do want to retain the distinction should do so
    (unaccountably) via glyph selection only, when
    at the core of its, it's a semantic issue.
    > I have yet to see cogent technical arguments for what real
    > issues are being addressed here, other than the need to *display*
    > uppercase [] glyphs on demand. The text processing arguments
    > have all been mumbo-jumbo and handwaving so far.
    I think the arguments are no less cogent than the arguments why you
    should not type 8 when you
    mean B and rely on font mechanisms to address the inconsistency from
    > Furthermore, while the proposers may not have "base[d] their
    > proposal on speculation on the direction of potential future
    > reform", it is pretty clear from the discussion on this list
    > that the decision to encode an uppercase [] is smack in the
    > middle of such speculation, and encoding it will be used as
    > a lever to make further changes.
    Here you take Unicode's role much too serious in the context of ongoing
    developments in German writing. Even if Unicode was so important that it
    be used to lever a change -- it is not Unicode's task to take sides in
    how a
    community of users wants to evolve their shared writing system. That
    would be
    just the kind of thing that Japanese users have always (unjustly)
    accused Unicode
    of doing. So why start now?


    >> I remember writing before somewhere that I think their proposal should
    >> be accepted as presented.
    > Ah, but it has been awhile since I've seen a single character
    > encoding proposal engender this much debate and controversy.
    > It may well be accepted as presented, but it is unlikely to
    > do so with any clear consensus.
    The character has been problematic in software support from the start
    (requiring a
    string expansion upon uppercasing) precisely because the 1901 orthography
    represents a snapshot in the evolution of the use of this character in
    writing German.
    The reasons for the minority orthographic practice of retaining the in
    context have to do with the same interim nature of the solution recommended
    at that time. As a *universal* character encoding standard, Unicode must
    not only
    remain neutral as far as further development of German orthography is
    whether sanctioned or unofficial, but must cater to both.

    Reading this rather lengthy set of counter-arguments has convinced me
    that the correct
    starting point is indeed at the character semantic level, which casts
    the question as
    to what form should an have that has been retained in ALL UPPERCASE
    so as to allow a distinction to SS for semantic reasons.

    Once you argue that this form is not the same as that of the lowercase
    letter, you then
    would have to argue why, suddenly, Unicode decides to depart from
    precedent and
    picks a glyph variation solution, rather than an uppercase form. If your
    rationale is
    adherence to that aspect of the standard German orthography that the use
    in ALL UPPERCASE TEXT (in whatever form) is meant to repudiate, then that's
    a poor argument indeed.

    In conclusion, I find myself on the side of the "semanticists" and urge
    that Unicode
    find a way to approve this proposal as presented.


    This archive was generated by hypermail 2.1.5 : Mon May 07 2007 - 18:37:03 CDT