Re: U+0140

From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Apr 19 2004 - 18:49:54 EDT

Next message: Kenneth Whistler: "Re: Diacritic Property and Phillipine Viramas"

Previous message: Peter Constable: "RE: U+0140"
Maybe in reply to: Patrick Andries: "Re: U+0140"
Next in thread: Asmus Freytag: "Re: U+0140"
Reply: Asmus Freytag: "Re: U+0140"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Peter Kirk continued this...

> On 19/04/2004 13:03, Kenneth Whistler wrote:
>
> >... Those other middle dots give
> >people textual representation alternatives now, if they need to make
> >distinctions, and textual rendering alternatives, if they need to make
> >middle dots which display with slightly different heights, sizes, or
> >spacings, depending on the rendering requirements.
> >
> >
>
> Ken, does Unicode specify height, size and spacing distinctions between
> the various middle dots which you listed?

No.

> If I understand correctly, it
> certainly doesn't do so exhaustively.

Correct.

> So in effect what you are
> suggesting here is that people make and use their own private
> distinctions between characters which are not defined by Unicode.

Not at all.

I am suggesting that people who use Unicode characters *will* use them
according to their identity. However, that doesn't mean that identification
of a character neatly solves all issues of their rendering, nor will it
automatically make things neat and tidy when people use characters in
different contexts which may have different rendering concerns.

The Unicode Standard is not prescriptive about rendering, beyond the
basics required to simply ensure correct mapping of textual content
into streams of characters. If one font vendor wants to have a raised
glyph for the MIDDLE DOT and another wants to have a lowered glyph for
the same character, it is not the Unicode Standard's business to put
the two vendors in a room until one gives up and admits the other one
is correct.

> This
> sounds very like advising people to ignore Unicode character identiies
> and properties and do their own thing. Rather strange advice from
> someone in your position, surely?

I love the way you put positions in peoples' mouths.

By the way, I challenge you to point to the Unicode character properties
in the Unicode Character Database which define the relative position for
middle dots with respect to x-height of a font, or the spacing of
middle dots, for example.

>
> Surely, in the current situation and if further proliferation of middle
> dots is considered undesirable,

It is undesirable, yes.

> users should be advised to presume that
> distinctions between middle dots are not a plain text matter

No, they should not. Because the existence of multiple different
middle dots in the standard which are *not* canonical equivalents
of each other makes it a plain text matter.

> and so
> should be handled by markup, including language selection.

In some cases, yes -- it depends on the effect which is intended,
and the context and application it occurs in.

>
> And if (as I just suggested on the Hebrew list might be true of some
> variant Hebrew pointing systems) someone finds a well documented script
> in which a true middle dot and an x-height dot are used contrastively,
> the correct approach would be either to accept, reluctantly, that at
> least one new dot needs to be encoded; or else for Unicode to define
> clearly which existing character should be used for which dot in this
> script.

Or: None of the Above

The users of characters for particular domains bear their own
responsibility to define their usage. It is not up to the Unicode
Consortium to go around defining everyone's spelling rules and
orthographic conventions for them.

If there are things unclear in the standard which are making its
use difficult for people in certain cases, then that is certainly
a concern of the Unicode Technical Committee. And if someone
brings in convincing evidence of the existence of a semantically
significant plain text distinction between two dots that cannot
plausibly be handled by *any* combination of the multitudinous dot
characters already present in the standard, then the UTC might
consider that sufficient justification to encode yet another
middle dot.

Given, however, the fact that there already are so many dot characters,
and given that their rendering often varies by font, the chance of
getting some additional pair of dot distinctions by height on the
line canonized with yet another dot encoding seems unlikely to me.

It is a will-'o-the-wisp to expect any and all multilingual
Unicode text to display "correctly" to any arbitrary n-th degree
of typographical rectitude with any and all Unicode-conformant
fonts. The use of specific fonts with specific designs is
*precisely* to enable plain text (or marked-up text, for that
matter) to be displayed as desired for particular contexts.

The criterion for Unicode plain text is basically *legible*
text.

> The worst thing that could happen would be for different text
> providers to make different and incompatible selections among the
> existing characters, leading to total confusion. But that seems to be
> the approach which you, Ken, are advocating.

I see. And thank you, Peter, for pointing that error out to me.

Text providers have their own responsibility to ensure that
they are using interoperable conventions for the representation
of text.

The Unicode Standard does not tell providers of Latin text whether
they should interchange text using macrons over long vowels or
without, or using IPA length marks or middle dots or some other
convention, nor in all uppercase or in mixed case. It *does*
specify that the sequence <o, combining-macron> is canonically
equivalent to <o-macron>, so that text processes that deal with
Latin (or any other) text, should treat the interpretation of
those two sequences as the same. That's the difference.

--Ken

Next message: Kenneth Whistler: "Re: Diacritic Property and Phillipine Viramas"
Previous message: Peter Constable: "RE: U+0140"
Maybe in reply to: Patrick Andries: "Re: U+0140"
Next in thread: Asmus Freytag: "Re: U+0140"
Reply: Asmus Freytag: "Re: U+0140"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Apr 19 2004 - 19:37:44 EDT