Re: Uppercase is coming? (U+1E9E)

From: Kenneth Whistler (kenw@sybase.com)
Date: Mon May 07 2007 - 21:00:47 CDT

  • Next message: Richard Wordingham: "Adding Lowercase Letters (was: Uppercase is coming? (U+1E9E))"

    Asmus wrote:

    > On 5/7/2007 3:06 PM, Kenneth Whistler wrote:
    > >>> Adam Twardoch wrote:
    > >>> ... would make as little sense as encoding the
    > >>>
    > >>>> uppercase "ß" as "S ZWJ S".
    > >>>>
    > >
    > > But of course stating that way distorts the sense of the argument,
    > > anyway. The counterproposal is to say that given existing
    > > Unicode conventions, one could simply say that in those minority
    > > contexts where one wishes to display an <S, S> sequence as
    > > an uppercase [], use of a ZWJ to maintain a plain text distinction
    > > and a ligature from a font for presentation could suffice.
    > > That isn't *encoding* uppercase [] as "S ZWJ S"; it is
    > > displaying <S, ZWJ, S> with a ligature uppercase [] glyph.
    > >
    > In light of what the discussion has yielded so far, I find this line of
    > argument highly
    > disingenuous. It has been demonstrated, that in the context of German
                                ^^^^^^^^^^^^
                                
    Asserted, actually. To say it is demonstrated is simply to
    asume the conclusion here.
     
    > orthography,
    > the choice to retain in ALL UPPERCASE is a clear statement that "SS"
    > is not
    > the desired *text*.

    And as for any complex script, you can assert that all you want,
    but what matters in the end is whether text presentation and
    other text processing reflects user's expectations -- and not
    whether the character encoding required to obtain that effect
    consists of one, two, or three encoded characters.

    What I see is evidence that the presentation issues could be
    handled with existing mechanisms, without introducing another
    encoded character. What I don't see is unanimity about what
    user's expectations are for other aspects of text processing
    regarding this -- instead, what I see is evidence of disagreement
    among German users regarding what *is* the expected behavior.

    >
    > So, the question can never be whether one wants another glyph for "SS",
    > but whether
    > one needs another form of .

    Of course the question can be that. Particularly when the
    standard orthographic rules assert that the uppercase of *is* SS.

    > > And John Hudson's argument about this is that using existing
    > > mechanisms might work better as a practical matter, because
    > > it has graceful fallback behavior.
    > >
    > If fallback behavior was the issue, falling back to a lowercase would
    > be more
    > correct.

    *If* you assume that the expectation is that the fallback
    should be to the , because the user wanted it that way in
    the first place. But not if your expectation is that the
    fallback should be to the standard orthography.

    And fallback behavior is only *an* issue for implementation, not
    *the* issue in question, of course. If you assume that display
    of a .notdef glyph is good enough for when fonts are in transition
    (or not even transitioning), then fine. But it is highly unlikely
    that anybody is going to get out there with fonts that would
    fallback to displaying (lowercase) for the new character --
    what would be the point? That isn't a proper *fallback* for
    display -- although it might be a proper folding.

    > The argument is essentially essentialist, but it doesn't proceed the way
    > that you
    > are trying to summarize here. The argument is that is a letter, and
    > that recent
    > reform has endorsed that view by giving that letter a less ambiguous
    > function
    > in the (standard) orthography, where it now is used *in contrast* to
    > 'ss' to mark
    > long vowels.
    >
    > Therefore, the argument goes, uppercase is *in essence* a form of , and
    > never a form of "SS".

    I got that. In fact I got that about 50 postings back in this thread,
    when I explicitly pointed out the source of uppercase .

    And none of that vitiates the fact that itself has *equivalence*
    relations to "ss", and that the parallel uppercase would
    then have equivalent equivalence relations to "SS". (And both
    have, apparently, less prominent, but also equivalent equivalence
    relations to "sz" and "SZ", respectively.)

    > The fact that there is persistent minority variation in the orthography
    > on this issue is in
    > itself very telling, because of the fact that the popular view of German
    > writers is that
    > their orthography unambiguously follows mandated rules. (Contrast this
    > to the popular
    > view of Americans about their own orthography, and ponder this "thru"
    > and "through").

    What, so? Gee. German orthography is more encumbered with rules
    and with cultural expectations about not breaking the rules, so
    if people persist in breaking the rules anyway that is evidence
    that Unicode has to encode a character?

    Or perhaps if you don't expect people to ponder it that way, you
    would care to make your argument explicit instead of expressing
    it with winks and nods.

    > > There apparently is a camp of people
    > > who think that not only is uppercase [] a letter and
    > > deserving of encoding as a character, but it will inevitably
    > > be reckoned as the rightful uppercase mapping of , with
    > > further attendant changes to formal orthographic rules.
    > >
    > I'm sure there is. You can call them "friends of the uppercase " if
    > that makes you feel
    > better. I think it's great when letters can have fan-clubs. However, the
    > fact that there
    > are some people who are not only fascinated by watching slow changes in
    > their language
    > take place but wish to take the side in favor of a particular direction,
    > does not enter
    > into the analysis of the minority orthography and its use of in ALL
    > UPPERCASE context.

    Not into the analysis. But it does factor into the decision making
    when adding *characters* to the standard.

    > > But I don't think Asmus' pooh-poohing the concerns of John about
    > > the character implementation issue does justice to the real
    > > issues here.
    > >
    > > The proposal formally suggests that uppercase [] get a lowercase
    > > mapping to , but that, for stability, not get an uppercase
    > > mapping to uppercase []. That would be, to the best of my knowledge,
    > > an unprecedented kind of case mapping in the UCD,

    > The issue itself is not precedented.

    Precedent, please?

    I assume you are talking about the discussions of casefolding
    stability, which now specify that if there is an existing
    *uppercase* letter in the standard but no lowercase for it,
    that a lowercase paired letter cannot be added later, as
    casefolding stability would prevent adding a tolowercase() mapping
    for it, and failing that, the expectations about the case
    relation would not be met.

    What I'm looking for is an existing lowercase letter for which
    we have added an uppercase letter and a tolowercase() mapping
    for it but did not (and could not) add a touppercase() mapping
    for the preexisting lowercase letter. Does *that* precedent
    exist?

    The only other precedent I can think of is the sad tale of
    glottal stop -- and in *that* instance the final answer was
    to add a *pair* of orthographic uppercase and lowercase
    glottal stops, *distinguished* from the preexisting nominally
    lowercase (but not casemapped) glottal stop. That precedent
    actually doesn't help much, however, as I don't think *anybody*
    is going to get on board with introducing a new pair of
    sharp-s characters for the case pair, distinct from the
    existing sharp-s.

    > A formal solution would
    > (hypothetically) be to have localization based on the level of
    > 'orthography' and not on the level of language-country pairs. This
    > cannot be a real solution since trying to tag texts correctly and making
    > sure software is configured at all times correctly is unrealistic to the
    > extreme - the more so, as the ALL UPPERCASE context itself is so
    > restricted in German writing.
    >
    > Therefore the proposers wisely don't suggest such a thing, but consider
    > a certain lack of automatic case conversion in this instance a small
    > price to pay - as long as the text, once encoded, is transmitted and
    > displayed correctly.

    That is all fine and dandy, as long as no touppercase() mapping is
    ever done for . And the proposal is clear about that.

    What I am calling into question is the stability of *that* situation,
    given both the nature of the standard and its implementations,
    and the professed desired behavior (in the alternate universe
    in the indefinite future) for the character by the "friends of the
    uppercase "

    > Given that is used in ALL UPPERCASE context today, even in official
    > settings,
    > the existing case mappings tell only a partial story. Mapping to
    > itself on uppercasing
    > would match what I have called the 'minority' usage. Allowing a true
    > uppercase form
    > that maps to lowercase leaves that problem completely unaddressable,
    > but adds
    > no new problems.

    Two different solutions. Which are you advocating that the UTC
    actually do, then?

    > It is already impossible to round-trip from lower to
    > upper case
    > forms and back. All the change would do, is to allow (manual) support
    > for a character
    > that fits somewhat better in an ALL UPPERCASE environment, but that is case
    > folded correctly to and not ss.

    Yes.

    >
    > For users of the majority orthography, the side-effect of these mappings
    > is to allow easy conversion from non-standard to standard texts via
    > repeated case mappings. An unintended benefit, but, for the current (not
    > future) situation, a plus.

    I don't really see the difference. That is just the roundabout
    (and text-changing) way of ending up with "SS" in the text again.

    > > and has its
    > > own stability issue: there will be *years* of carping and rabblerousing
    > > that will follow on from that decision, as the camp which believes
    > > that the natural, self-evident, and essential casemapping
    > > relations should be:
    > >
    > > <--> uppercase []
    > > ss <--> SS
    > >
    > > will attempt to get the UnicodeData case mappings (and implementations
    > > that follow from that) and case foldings "fixed" to reflect that
    > > inevitable rightness.
    > >
    > Well, it's always easy to rouse a rabble, but to make this change would
    > require that the standard orthography be changed,

    No, it would require adding 8 bytes in UnicodeData.txt:

    00DF;LATIN SMALL LETTER SHARP S;Ll;0;L;;;;;N;;German;;;

    00DF;LATIN SMALL LETTER SHARP S;Ll;0;L;;;;;N;;German;1E9E;;1E9E

    and removing one line from SpecialCasing.txt.

    And it takes considerably smaller rabble to get the UTC to
    change 8 bytes in UnicodeData.txt than it takes to change the
    standard German orthography.

    And then, as now, the onus would be on implementers to see how
    they further adjusted their implementations to try to meet
    the changing expectations of both the standard orthography
    and those advocating this alternative casing.

    > so it's a simple
    > matter of directing said rabble in that direction. If they can convince
    > their fellow countrymen that it's worth enduring another orthography
    > reform, then so much more power to them (hardly likely, *if* one cared
    > to predict the future based on recent experience).

    It doesn't actually take a mass
    movement and governmentally-mandated orthography change to
    make changes in the Unicode Standard that might or might not
    be better for representation of German data in Unicode. In fact,
    that is exactly what the current exercise seems to be all about.

    > > But any changes in such a direction *are* the kind of software
    > > development nightmare that John Hudson is warning about.
    > >
    > And as I observed, as quoted below, that applies to *any* change. The
    > unhappiness with the current crop of spell-checkers, trying to
    > faithfully implement the last reform, has reached endemic proportions in
    > Germany: ink is spilled by the book on that subject. Unicode can't
    > simply say: "no reforms, you've got what you got, and don't throw a fit"
    > and be done with it.

    And you think that *less* ink will be spilled or that implementations
    will be *any* better able to meet everyone's expectations, *after*
    the addition of this character? I rather doubt it, actually.

    > > I won't bother trying to get them to pledge that they won't ask
    > > for that, because they may well say so now (as the proposal does),
    > > but then simply turn around and ask for the changes anyway.
    > >
    > Personalizing it in this way is not helpful. If the orthography changes,
    > Unicode will be asked to change, and will have to follow.

    No, it doesn't. An orthography change does not automatically
    mandate a character encoding change. It *might*. But the case
    needs to be made, in any such instance, whether existing characters
    meet the needs of the existing and the new orthography, or not.

    I agree with you that *if* the German orthography changes, and it
    is clearly determined that and uppercase [] are a case pair,
    and must be cased to each other and *not* to ss or SS, then
    implementations will have to follow, and that under those
    circumstances, having an uppercase [] as a distinctly encoded
    character would be clearly the best choice. But that also implies
    having the touppercase() mapping in place for it.

    > If the
    > orthography doesn't change, Unicode may be asked, whether by the authors
    > of the current proposal or by unrelated enthusiasts, but in neither case
    > does it have to act.

    Easier said than not done, actually. Many odd things have happened
    to the standard as the result of such things.

    > > But I just don't buy that argument. The "existing usage" can
    > > be supported with existing characters and with properly designed
    > > fonts, actually.
    > Not unless you are thinking about fonts that have a variant form of in
    > ALL UPPERCASE context. There are fonts that are ALL UPPERCASE, such as
    > Augsburger Titling, but asking for a font change for uppercase seems
    > somehow not following the precedent set by Unicode's encoding model.

    See back to top of document. Display of uppercase [] as a ligature
    in a font for <S, ZWJ, S> would get display for existing practice.
    You don't agree, apparently, because you're assuming a premise I am not, but
    there you have it. So no, I am not assuming use of glyph variants
    for casing.

    > Had the encoding model been one of using a COMBINING UPPERCASE FORM
    > SELECTOR applied to the lower case character, the whole issue would now
    > not be discussed, or not in the same way.
    > > I think this comes back down to the essentialist
    > > argument again. There is a group of German users and scholars
    > > who believe that uppercase [] *is* a character, and it is
    > > *that* which deserves to be supported, apparently.
    > >
    > Based on the view that *is not* a form of "ss".
    >
    > Once you accept that,

    I accept that.

    > and recent changes have made that more compelling,
    > then you ask yourself why it is necessary to insist that it cannot be
    > represented in distinction to SS in uppercase, and why, if it is a letter,
    > those that do want to retain the distinction should do so
    > (unaccountably)

    Not unaccountably, by the way.

    > via glyph selection only, when

    Not via glyph selection only, by the way.

    > at the core of its, it's a semantic issue.

    And not all semantic issues in text are carried directly by
    a character distinction, anyway. So claiming this is semantic
    constitutes only part of an argument for actually encoding the
    character.

    > > I have yet to see cogent technical arguments for what real
    > > issues are being addressed here, other than the need to *display*
    > > uppercase [] glyphs on demand. The text processing arguments
    > > have all been mumbo-jumbo and handwaving so far.
    > >
    > I think the arguments are no less cogent than the arguments why you
    > should not type 8 when you
    > mean B and rely on font mechanisms to address the inconsistency from
    > context.

    I see. Can you remind me again why bringing up Devanagari ksa
    was an irrelevancy? Or wait! If I can type 8 when I mean B, why
    not just go all the way and type then I mean 8? ;-)

    > > Furthermore, while the proposers may not have "base[d] their
    > > proposal on speculation on the direction of potential future
    > > reform", it is pretty clear from the discussion on this list
    > > that the decision to encode an uppercase [] is smack in the
    > > middle of such speculation, and encoding it will be used as
    > > a lever to make further changes.
    > Here you take Unicode's role much too serious in the context of ongoing
    > historical
    > developments in German writing. Even if Unicode was so important that it
    > could
    > be used to lever a change -- it is not Unicode's task to take sides in
    > how a
    > community of users wants to evolve their shared writing system.

    Nor do I think that is Unicode's task. I think you realize
    that I have always been a very strong advocate of the position
    that the Unicode Standard is *not* an orthographic or spelling
    standard.

    I am saying that the existence of the uppercase [] will be
    used as a lever by *others* (not the Unicode Consortium) to
    try to open the door to further change. And this is not
    the first time that proponents of orthographic reform have
    come to the Unicode Standard with character encoding proposals
    that would ease their path towards making other changes
    in orthography. So it is an issue that the UTC needs to
    take into account as relevant to their decision-making.

    > That
    > would be
    > just the kind of thing that Japanese users have always (unjustly)
    > accused Unicode
    > of doing. So why start now?

    Well, indeed! Why start now? Wait until the German government
    mandates the orthographic change and *then* encode the character. :-)

    > As a *universal* character encoding standard, Unicode must
    > not only
    > remain neutral as far as further development of German orthography is
    > concerned,
    > whether sanctioned or unofficial, but must cater to both.

    O.k.
     
    >
    > Reading this rather lengthy set of counter-arguments has convinced me
    > that the correct
    > starting point is indeed at the character semantic level, which casts
    > the question as
    > to what form should an have that has been retained in ALL UPPERCASE
    > context
    > so as to allow a distinction to SS for semantic reasons.

    And we are right back around to the start of the argument.

    But the alternative way to cast the question is how should an
    uppercase [] glyph for display be represented in text and
    be retained in contexts where users expect it to be. And stated
    that way, it doesn't assume the answer you are supporting.

    > Once you argue that this form is not the same as that of the lowercase
    > letter, you then
    > would have to argue why, suddenly, Unicode decides to depart from
    > precedent and
    > picks a glyph variation solution, rather than an uppercase form. If your
    > rationale is
    > adherence to that aspect of the standard German orthography that the use
    > of
    > in ALL UPPERCASE TEXT (in whatever form) is meant to repudiate, then that's
    > a poor argument indeed.

    Thank you. I take pride in my poor argumentation.

    --Ken

    >
    > In conclusion, I find myself on the side of the "semanticists" and urge
    > that Unicode
    > find a way to approve this proposal as presented.
    >
    > A./



    This archive was generated by hypermail 2.1.5 : Mon May 07 2007 - 21:02:38 CDT