Re: Rendering Raised FULL STOP between Digits from Philippe Verdy on 2013-03-27 (Unicode Mail List Archive)

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Wed, 27 Mar 2013 20:07:35 +0100

2013/3/27 Asmus Freytag <asmusf_at_ix.netcom.com>:
> At the moment, the statement that the existing encoding is actually
> implementable is something that must be considered unproven (enough issues
> have been pointed out for various elements of the unification already to
> allow such a conclusion).
>
> What we are not getting closer to is a rational understanding of how to
> improve this situation. "Random" addition of middle dot characters for some
> purpose is just as bad as pretending everything is fine with the status quo.

We are in fact not discussing "random" additions but want to handle
correctly use cases that are in fact very frequently needed.

For example The Catalan syllable breaker is not a "random" case, it is
in fact highly used and needed as part of its standard orthography
(and Catalan is not a minor language, we cannot just ignore it).

There are very frequent uses of the dots, and hyphens which are too
much overloaded in their original ASCII-only encoding. same thing
about apostrophes/quotes. This causes enough nightmares when trying to
parse text, and it's unbelievable that there's no solution to augment
the text with either distinct characters, or some varant selectors, or
some other format controls to disambiguate these uses, which is really
semantic on essential character properties (which are in Unicode since
long, like the general category).

The solution based on an upper-layer protocol will not work (for
example in filenames, in databases of toponyms, or in archived legal
documents whose interpretation should not cause any trouble, including
when these documents are converted or exported to many other formats).
We are here exactly within the definition of linguistic rules for each
language, some of them being highly standardized and which would
require a stricter, less ambiguous ebcoding. The time os ASCII only is
over, The UCS offers many new unused possibilities, as well as many
existing technical solutions, which should not be based just on an
heuristic (which will ever break on many cases). Ysers want to get
sure that their text will not be misinterpreted, or rendered in an
ambiguous or wrong way.

Even if the solutions proposed seem "novel" this should not block us.
And even a "novel" solution can work in compatibility with the huge
existing corpus of texts which will remain ambiguous as they are. The
novel encoding solution can perfectly provide a fallback mechanism
where it will adopt the old compatibility scheme (similar to ASCII).

Of course, nothing will prevent anyone to use characters as they want
in "random" cases, even if this breaks all commonly admitted
properties and behaviors. But this should be distinguished from
frequently used cases which have rules formulated since long in
wellknown languages (excepr that now the texts have to live in a
environement which is more and more multilingual, for which it's not
possible to just infer which lalnguage to select to apply its
wellknown rules). We have no other solutions than providing explicit
"hints" in the encoded texts (and to forget the time of ASCII-only,
except in some technical domains like programming languages and
transport/storage protocols which have their own internal syntaxes and
which do not qualify relaly as "plain text").
Received on Wed Mar 27 2013 - 14:10:00 CDT

This archive was generated by hypermail 2.2.0 : Wed Mar 27 2013 - 14:10:02 CDT