Re: Digit/letter variants in the "same" unified script (was: stability policy on numeric type = decimal)

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Thu Jul 29 2010 - 23:04:26 CDT

Next message: Martin J. Dürst: "Re: High dot/dot above punctuation?"

Previous message: Juanma Barranquero: "Re: High dot/dot above punctuation?"
Maybe in reply to: Mark Davis ☕: "Re: Digit/letter variants in the "same" unified script (was: stability policy on numeric type = decimal)"
Next in thread: CE Whitehead: "RE: Digit/letter variants in the "same" unified script (was: stability policy on numeric type = decimal)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

"karl williamson" <public@khwilliamson.com> wrote:
> This discussion doesn't make sense to me. The original proposal to
> encode 19DA says that there is one set of digits in New Tai Lue, but
> there is an extra digit '1' (the one that got put at 19DA), used when
> the other digit '1' is visually confusable with another character in the
> script, which it resembles. That makes it sound like the two are
> essentially used as glyph variants of each other, and are
> interchangeable as far as the computer recognizing an input number.

Yes, the exception will work for recognizing this digit as an
exception for INPUT, but you still have a problem for output, because
your library will need to know when to output the variant : if you
always use the default digit 1, you'll create a string that is
possibly confusable to the reader, notably if it appears alone with no
other digit.

So you'll still need an exception to change one or several of these
digits 1, to use the variant, or you'll decide to always use the
variant (which causes no confusion), but I'm not sure that such use
would be valid in the target language. There are possibly complex
rules deciding when the variant is needed and accepted, or when the
default variant is preferable and not confusable.

For Arabic ther are clearly two separate sets of digits, but the
possibility of mixing them arbitrarily is still a problem for IDNA (if
both sets are accepted), notably because most digits (except 4 to 6)
are completely identical. So registries will have to:
- either accept one set and reject the other one
- accept both, but only one within the same domain label, reserving
also the label using the other set (as if they were canonically
equivalent).

Such equivalences (which are definitely not canonical) can be handled
by tailored collation compares (operating at collation level 2 only,
when non-IDN registries operate only at level 1), where IDN registries
will use their own tailoring. I just see the IDN "StringPrep" as a
particular application of the general concept of collation mappings
(except that it was not designed on linguistic bases, but an IDN
registry can be viewed as a locale for collation purposes). All these
complex rules and mappings of IDN can be written in terms of a set
collation rules, added on top of the DUCET.

Next message: Martin J. Dürst: "Re: High dot/dot above punctuation?"
Previous message: Juanma Barranquero: "Re: High dot/dot above punctuation?"
Maybe in reply to: Mark Davis ☕: "Re: Digit/letter variants in the "same" unified script (was: stability policy on numeric type = decimal)"
Next in thread: CE Whitehead: "RE: Digit/letter variants in the "same" unified script (was: stability policy on numeric type = decimal)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Jul 29 2010 - 23:07:14 CDT