From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sat Aug 09 2003 - 18:31:19 EDT
On Saturday, August 09, 2003 11:14 PM, Peter Kirk <peter.r.kirk@ntlworld.com> wrote:
> On 09/08/2003 13:41, John Cowan wrote:
>
> > Peter Kirk scripsit:
> >
> >
> >
> > > The gap may not be large, but Philippe, John H and I have
> > > identified a real gap. Why this antagonism against filling it?
> > >
> > >
> >
> > What you have identified is a set of implementation defects, not
> > problems with the Unicode Standard. The standard way to do what
> > you want is to precede the combining mark with SP or NBSP. If that
> > "doesn't work", then the implementation that makes it not work
> > needs to be fixed.
> >
> >
> >
> Tell Microsoft! (See Noah Levitt's posting.)
And the W3C or SGML commities with the *ML character model!
> If this is indeed "The standard way to do what you want", then the
> standard needs to make it clear that the sequence of <space, combining
> mark> or <NBSP, combining mark> has the properties which I want, i.e.
> it has the width of the combining mark alone, and not the full width
> of a space, and does not expand for justification, is not a line
> breaking opportunity, does not in fact have any of the properties of
> a space. I expect to see such a clarification in the next edition of
> the Unicode Standard.
Don't forget the issues created by the fact that in many cases, there's
no other way than using "defective" sequences, hoping that the
implementation will render the diacritic alone and not the dotted circle,
and will correctly space the diacritic. For now the tricky solution using
any (unspecified) control character before the diacritic is really
a trick, and not interoperable, and it complexifies the plain-text search
application where there is no predictable or stable base character to
match this diacritic (in addition, many input methods or keyboard driver
will not allow you to enter such "defective" sequence, meaning that for
example the "Yerushala(y)im" word cannot be entered and searched
exactly within a large text, as the implied invisible letter has no stable
representation).
Note that the CGJ solution will not work when the isolated diacritic must
be the initial of a word or breakable token: for this case, the solution with
SPACE is really tricky due to the special treatment of SPACE notably
in HTML, SGML, XML and often SQL which "normalize" whitespaces.
Thanks, the existing spacing diacritics do not have these problems as
they are not canonically equivalent to the suggested SPACE+diacritic
"compatibility equivalent", however this is only part of a solution for
some diacritics (not ALL), and it only fills the use as symbols, but not
as regular letters within the same word with surrounding letters.
So there is really two gaps: a small gap for missing spacing diacritics
used as symbols, and a large gap for all isolated diacritics used within
a word (that the CGJ solution only solves in the middle or at end of a
word, but not at its initial).
-- Philippe. Spams non tolérés: tout message non sollicité sera rapporté à vos fournisseurs de services Internet.
This archive was generated by hypermail 2.1.5 : Sat Aug 09 2003 - 19:02:56 EDT