Re: Rendering Raised FULL STOP between Digits from Richard Wordingham on 2013-03-23 (Unicode Mail List Archive)

From: Richard Wordingham <richard.wordingham_at_ntlworld.com>
Date: Sat, 23 Mar 2013 11:45:31 +0000

On Fri, 22 Mar 2013 18:49:24 -0700
Asmus Freytag <asmusf_at_ix.netcom.com> wrote:

> On 3/22/2013 6:17 PM, Richard Wordingham wrote:
> > On Fri, 22 Mar 2013 18:01:14 -0700
> > Asmus Freytag <asmusf_at_ix.netcom.com> wrote:
> >
> >>> On 03/21/2013 04:48 PM, Richard Wordingham wrote:
> >>>> However, distinguishing U+00B7 and U+0387 would fail
> >>>> spectacularly of the text had been converted to form NFC before
> >>>> you received it.
> >> That's a claim for which the evidence isn't yet solid and if it
> >> could be made solid would make that claim very interesting.
>
> Distinguishing the character codes will fail trivially.

Exactly. That is the point I made.

> The question
> is whether analysis or processing of the text will "fail
> spectacularly". The latter is the true test of whether the
> unification is "broken".

I did not claim that such analysis should fail spectacularly. The
root of the problem is that there are at least four uses of mid point
which we can't yet say definitively are wrong:

1) Ano teleia;
2) Internal boundary in Catalan (actually, this is arguably wrong),
Occitan and other languages;
3) Traditional British decimal point (not formally confirmed as
acceptable, but common practice where technology has not suppressed
this part of British culture); and
4) A phonetic symbol used for transliterating Tangut.

There are other uses, but usually they reflect an origin in an 8-bit
encoding.

The character properties of U+00B7 have been crafted to support the
first two, and I don't see any problem with further adjusting them
to support the first three. Trailing decimal points may be
interpreted as ano teleia, but semantically that's no worse than the
handling of a trailing full stop in a number. Extending the
properties to cover all four uses looks difficult, but then, the
character properties of U+002E FULL STOP can't fully support all its
uses.

Usually one can tell the four uses apart, but not always. Greek and
Tangut don't mix well, and hard line breaks can obscure the differences
between uses (1) and (2). In most cases, it is known how a text uses
U+00B7, but there might very well not be an interface for conveying
such information to analysis software.

Richard.
Received on Sat Mar 23 2013 - 06:49:52 CDT

This archive was generated by hypermail 2.2.0 : Sat Mar 23 2013 - 06:49:53 CDT