From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sun Aug 10 2003 - 18:23:26 EDT
On Sunday, August 10, 2003 9:17 PM, Peter Kirk <peter.r.kirk@ntlworld.com> wrote:
> On 10/08/2003 10:09, Michael Everson wrote:
>
> > It is the formally specified way to represent what you say you want
> > to represent. If an implementation doesn't do that nicely enough,
> > complain to the implementors. (This has already been suggested to
> > you.)
>
> As has already been clearly pointed out by Philippe, Kent and myself
> (and ignored by those opposed to any change), the combination SPACE +
> diacritic does not have the required categories, properties and
> specification for the function it is supposed to perform. Either these
> categories etc need to be adjusted (and I don't expect the general
> category of SPACE to be changed!), or some exceptional mechanism needs
> to be clearly defined, or, by far the simplest solution, a new base
> character can be defined which, when combined with the diacritic, has
> the required categories and properties.
That's exactly what I suggested (and I used the word "suggest", and
wanted to show the inaccuracy of the SPACE or NBSP to represent
spacing diacritics as a normal symbol, due to the undocumented
properties for that combination). Due to the lack of formal
documentation (no one here demonstrated that such sequence with
SPACE was really documented as such somewhere in the Unicode
specs), such legacy usage is still just a hack which only works
sometimes, but not always as intended because it contradicts some
other principles like the inheritance of the base character properties
to the whole combining sequence using it.
And still, even if SPACE+diacritics is documented now as producing
officially a symbol, its properties are still not defined (not interoperable
as varying among implementations), and it still gies problems with the
huge legacy use of SPACE as a padding character or with
space normalizations like in XML, HTML and SGML.
In addition, it still does not solve the problem of its insertion within
words, and of its directionality for BiDi, its parsing for breaking
(line breaking, word breaking, ...) where distinct base character(s)
for the correct interpretation would be needed.
Yes I have read your comment, and Yes I know that
SPACE+diacritics is widely used. But this is with many unsolved
problems that one could legitimately want to solve with more precise:
- definition of such combining sequence with SPACE
- definition of its properties
- documentation within the Unicode breaking algorithms
- adjustments to the BiDi specs
- etc...
If all these adjustments are made, there will be many, all of them
handled like exceptions to the normal rules, when a much simpler
approach (which would not require all these changes in specs),
would consist in defining other(s) more explicit base character(s)
for the appropriate function.
If Ken, Michael, Kent and other respectable UTC members can't
see the problem, who will? Please consider the problem itself and
don't be too much focused on the exact terminology that you would
have used yourself to better describe the problem and its solutions.
I am not discussing the terminology itself, but the lack of
documentation and support for what seems a true interoperability
problem. So please don't flame me with sarcasms, that's not the
subject of my messages which do not want to comment about
the respective Unicode expertize of respectable UTC members...
Sorry if this message seems still too long for you. But each time
I want to be short, I am flamed for inaccuracies, or imprecisions,
or suspected of claiming something about the standard when in
fact I am not discussing what is currently in the standard itself,
but what is not there now and causes problems. It's easy to
be short if you only refer to the standard itself, and only respond
as if this list was just a FAQ.
-- Philippe. Spams non tolérés: tout message non sollicité sera rapporté à vos fournisseurs de services Internet.
This archive was generated by hypermail 2.1.5 : Sun Aug 10 2003 - 18:57:30 EDT