From: Peter Kirk (peterkirk@qaya.org)
Date: Sat Apr 24 2004 - 19:00:18 EDT
On 24/04/2004 15:16, Ernest Cline wrote:
>
>
>
>>[Original Message]
>>From: Peter Kirk <peterkirk@qaya.org>
>>
>>On 24/04/2004 11:22, Ernest Cline wrote:
>>
>>
>>
>>>...
>>>
>>>
>>>As someone who has put a lot of thought into variation selectors, let me
>>>point out something. In the case of B M1 M2 VS what would the variation
>>>selector indicating as being varied if such a thing were to be allowed? ...
>>>
>>>
I have re-read section 15.6 of the standard. It is absolutely clear that
a VS applies only to the immediately preceding character, and not to a
complete combining sequence:
> A variation sequence, which always consists of a base character
> followed by the variation selector,...
There is no suggestion that more than a single character may precede the VS.
>>>...Since variation selectors are combining marks, then just like any other
>>>combining marks they should be viewed as being applied to the entire
>>>combining sequence up to that point, and hence should be viewed as
>>>indicating a variant of B M1 M2, and not of just the preceding mark. ...
>>>
>>>
Whether or not this applies to other combining marks, it explicitly does
not apply to VSs. Well, it is of course also explicit that any sequence
of a combining mark followed by a VS is not sanctioned for standard use.
>>>... Any other treatment complicates things too much.
>>>
>>>
Some other treatment is clearly what the UTC had in mind.
>>I always assumed that VS's are intended to apply to just the immediately
>>preceding character, and not to a whole combining character sequence. In
>>my opinion, "Any other treatment complicates things too much." But
>>perhaps there are others who can tell us what the UTC intended for this.
>>
>>
>
>Which is why as things currently stand, the standard calls for the only
>legal
>sequences to involve base characters only. To quote from Section 15.6:
>
>"The base character in a variation sequence is never a combining
>character or a decomposable character. The variation selectors
>themselves are combining marks of combining class 0 ..."
>
>In order to get Variation Selectors even able to be applied to
>other combining marks one would need to change the way
>Variation Selectors work, and doing that is what would complicate
>things too much.
>
>
>
I agree that a change is necessary. I disagree that it would complicate
things too much.
>>>Thus in the case of the vowel marks, one could add a series of variation
>>>sequences with one for each base character that the variant vowel
>>>mark would be used with. If this causes too many other problems, ...
>>>
>>>
>>It would indeed if someone considers that every such combining sequence
>>has to be enumerated and defined individually. But if one simply says
>>that every combining sequence containing e.g. the sequence <QAMATS, VS1>
>>is legal and represents use of the variant qamats glyph, then there is
>>no problem.
>>
>>
>
>There are tons of problems once one adds in other combining marks
>being applied to the character as well, because then under normalization,
>unless the mark you were applying the variation selector to is of
>combining class 0, you can't assure that the variation selector will
>stay with the mark. Having the existing Variation Selectors behave
>in that way would break the normalization stability guarantee, ...
>
This is untrue. Normalisation stability does not apply when the text is
changed, and inserting a variation selector is a change to the text. I
have never suggested changing the combining class or other normalisation
properties of existing VSs. The way to ensure that a VS stays with the
mark it applies to is to ensure that in the part of the combining
character sequence before the VS all combining characters are already in
canonical order. Well, I can see that there are potential problems where
there are canonical decompositions (which are not composition
exclusions), but that does not apply to the cases I am interested in.
>... so that
>can't be done, so you would need to introduce new Variation
>Selectors that would behave in this novel fashion.
>
>In order to do so, under the existing combining class framework you
>would need to add variation selectors with the same combining class
>as the mark it works with. An alternative would be to add yet another
>property for these new Variation Selectors so as to have it go outside
>the existing canonical combining class rules when it comes to
>canonical ordering.. Either way, it won't work properly with existing
>implementations, involves a lot more work than adding another
>vowel mark, and will not solve the problem of legacy data using the
>vowel mark for both the main version and its variant. ...
>
The former, VSs with various combining classes, would work perfectly
well with existing implementations as soon as they have been updated
with character data for these new characters. Adding a new mark has no
advantage over this, as it also cannot be used until the character data
is updated, and the disadvantage that (once the character data has been
updated) the VS, being default ignorable, is simply ignored when a font
which does not support it is used, whereas the new mark is supported
only when it is included in a font. There will always be a legacy data
problem, but the VS mechanism was defined precisely to minimise this
problem, and as such it has the potential of minimising it for combining
characters just as it does for base characters.
>... I just don't
>see the benefits justifying the costs. If there were a number of use
>cases for doing this, it might justify the effort required, but for only
>a couple of vowel marks, I can't see it.
>
>
Well, it is more than a couple, and anyway I don't see the costs as
being high. On the Hebrew list I listed yesterday six candidates for
definition as variation sequences, each of one Hebrew combining mark
plus a variation selector. Five of these sequences have the potential of
solving an issue for which a proposal either has been made or is being
considered, and for which the alternative would probably be to define a
new character. (The sixth had apparently been rejected as too marginal:
it probably doesn't merit a separate character but might be worth
defining as a variation sequence.) So potentially we save five new
characters by using either an already defined VS or a special one
defined for Hebrew. I have just thought of a seventh possible sequence,
although in this case the alternate glyph is already encoded as an
alphabetic presentation form (U+FB1E). There is also the possibility of
using VSs to indicate alternative pointing schemes. These are all in
Hebrew. There may well be similar examples in other scripts - in fact I
vaguely remember seeing that some texts (German black letter, I think)
distinguish umlaut from diaeresis, and this is something which could be
handled by a combining character VS (although here there are problems
with normalisation composition). So this is potentially a large field!
-- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/
This archive was generated by hypermail 2.1.5 : Sat Apr 24 2004 - 19:33:12 EDT