Re: U+0140

From: Kenneth Whistler (
Date: Mon Apr 19 2004 - 18:49:54 EDT

  • Next message: Kenneth Whistler: "Re: Diacritic Property and Phillipine Viramas"

    Peter Kirk continued this...

    > On 19/04/2004 13:03, Kenneth Whistler wrote:
    > >... Those other middle dots give
    > >people textual representation alternatives now, if they need to make
    > >distinctions, and textual rendering alternatives, if they need to make
    > >middle dots which display with slightly different heights, sizes, or
    > >spacings, depending on the rendering requirements.
    > >
    > >
    > Ken, does Unicode specify height, size and spacing distinctions between
    > the various middle dots which you listed?


    > If I understand correctly, it
    > certainly doesn't do so exhaustively.


    > So in effect what you are
    > suggesting here is that people make and use their own private
    > distinctions between characters which are not defined by Unicode.

    Not at all.

    I am suggesting that people who use Unicode characters *will* use them
    according to their identity. However, that doesn't mean that identification
    of a character neatly solves all issues of their rendering, nor will it
    automatically make things neat and tidy when people use characters in
    different contexts which may have different rendering concerns.

    The Unicode Standard is not prescriptive about rendering, beyond the
    basics required to simply ensure correct mapping of textual content
    into streams of characters. If one font vendor wants to have a raised
    glyph for the MIDDLE DOT and another wants to have a lowered glyph for
    the same character, it is not the Unicode Standard's business to put
    the two vendors in a room until one gives up and admits the other one
    is correct.

    > This
    > sounds very like advising people to ignore Unicode character identiies
    > and properties and do their own thing. Rather strange advice from
    > someone in your position, surely?

    I love the way you put positions in peoples' mouths.

    By the way, I challenge you to point to the Unicode character properties
    in the Unicode Character Database which define the relative position for
    middle dots with respect to x-height of a font, or the spacing of
    middle dots, for example.

    > Surely, in the current situation and if further proliferation of middle
    > dots is considered undesirable,

    It is undesirable, yes.

    > users should be advised to presume that
    > distinctions between middle dots are not a plain text matter

    No, they should not. Because the existence of multiple different
    middle dots in the standard which are *not* canonical equivalents
    of each other makes it a plain text matter.

    > and so
    > should be handled by markup, including language selection.

    In some cases, yes -- it depends on the effect which is intended,
    and the context and application it occurs in.

    > And if (as I just suggested on the Hebrew list might be true of some
    > variant Hebrew pointing systems) someone finds a well documented script
    > in which a true middle dot and an x-height dot are used contrastively,
    > the correct approach would be either to accept, reluctantly, that at
    > least one new dot needs to be encoded; or else for Unicode to define
    > clearly which existing character should be used for which dot in this
    > script.

    Or: None of the Above

    The users of characters for particular domains bear their own
    responsibility to define their usage. It is not up to the Unicode
    Consortium to go around defining everyone's spelling rules and
    orthographic conventions for them.

    If there are things unclear in the standard which are making its
    use difficult for people in certain cases, then that is certainly
    a concern of the Unicode Technical Committee. And if someone
    brings in convincing evidence of the existence of a semantically
    significant plain text distinction between two dots that cannot
    plausibly be handled by *any* combination of the multitudinous dot
    characters already present in the standard, then the UTC might
    consider that sufficient justification to encode yet another
    middle dot.

    Given, however, the fact that there already are so many dot characters,
    and given that their rendering often varies by font, the chance of
    getting some additional pair of dot distinctions by height on the
    line canonized with yet another dot encoding seems unlikely to me.

    It is a will-'o-the-wisp to expect any and all multilingual
    Unicode text to display "correctly" to any arbitrary n-th degree
    of typographical rectitude with any and all Unicode-conformant
    fonts. The use of specific fonts with specific designs is
    *precisely* to enable plain text (or marked-up text, for that
    matter) to be displayed as desired for particular contexts.

    The criterion for Unicode plain text is basically *legible*

    > The worst thing that could happen would be for different text
    > providers to make different and incompatible selections among the
    > existing characters, leading to total confusion. But that seems to be
    > the approach which you, Ken, are advocating.

    I see. And thank you, Peter, for pointing that error out to me.

    Text providers have their own responsibility to ensure that
    they are using interoperable conventions for the representation
    of text.

    The Unicode Standard does not tell providers of Latin text whether
    they should interchange text using macrons over long vowels or
    without, or using IPA length marks or middle dots or some other
    convention, nor in all uppercase or in mixed case. It *does*
    specify that the sequence <o, combining-macron> is canonically
    equivalent to <o-macron>, so that text processes that deal with
    Latin (or any other) text, should treat the interpretation of
    those two sequences as the same. That's the difference.


    This archive was generated by hypermail 2.1.5 : Mon Apr 19 2004 - 19:37:44 EDT