Re: Digit/letter variants in the "same" unified script (was: stability policy on numeric type = decimal)

From: karl williamson (public@khwilliamson.com)
Date: Thu Jul 29 2010 - 17:01:22 CDT

  • Next message: Kenneth Whistler: "Re: [ISO15924] Typo for Egyptian_Hierog(l)yphs"

    Mark Davis ☕ wrote:
    >
    > Mark
    >
    > /— Il meglio è l’inimico del bene —/
    >
    >
    > On Thu, Jul 29, 2010 at 05:57, Philippe Verdy <verdy_p@wanadoo.fr
    > <mailto:verdy_p@wanadoo.fr>> wrote:
    >
    > "Martin J. Dürst" <duerst@it.aoyama.ac.jp
    > <mailto:duerst@it.aoyama.ac.jp>> wrote:
    > >
    > > On 2010/07/29 13:33, karl williamson wrote:
    > > > Asmus Freytag wrote:
    > > >> On 7/25/2010 6:05 PM, Martin J. Dürst wrote:
    > >
    > > >>> Well, there actually is such a script, namely Han. The digits
    > (一、
    > > >>> 二、三、四、五、六、七、八、九、〇) are used both as letters
    > and as
    > > >>> decimal place-value digits, and they are scattered widely, and of
    > > >>> course there are is a lot of modern living practice.
    > >
    > > >> The situation is worse than you indicate, because the same
    > characters
    > > >> are also used as elements in a system that doesn't use
    > place-value,
    > > >> but uses special characters to show powers of 10.
    > >
    > > No. Sequences of numeric Kanji are also used in names and word-plays,
    > > and as sequences of individual small numbers.
    >
    > (1) Existing exception :
    >
    > There's one example of a digit which has a numeric type = decimal, AND
    > is encoded in a "scattered" way:
    >
    > 19DA;6618;᧚;New Tai Lue Tham Digit One;Nd;0;L;...;1;1;1;N
    >
    > The other decimal nine digits for the Tham variant of the New Tai Lue
    > digits are borrowed from another sequence of decimal digits, starting
    > at U+19D0 (for digit zero) with the exception of U+19D1 which is
    > replaced (for digit one). Both sets are assigned in the same
    > "New_Tai_Lue" script property value.
    >
    > So the additional stability proposal will not be enforceable.
    >
    >
    > On the contrary. Were we do want such a policy, the implication would be
    > either to:
    > (a) change the type of 19DA from Nd to No (what I think would be the
    > right thing to do)
    > (b) grandfather in the character.

    This discussion doesn't make sense to me. The original proposal to
    encode 19DA says that there is one set of digits in New Tai Lue, but
    there is an extra digit '1' (the one that got put at 19DA), used when
    the other digit '1' is visually confusable with another character in the
    script, which it resembles. That makes it sound like the two are
    essentially used as glyph variants of each other, and are
    interchangeable as far as the computer recognizing an input number.

    Thus, it is appropriate to keep it as Nd, and it isn't scattered,
    because it is adjacent to the block of 10 digits. My original proposal
    accounted for this case, asking that the slot or two immediately above
    the digit '9' be unassigned initially in a new script encoding, just in
    case a situation like this one arises again.

    One thing that I should have brought up earlier in this discussion is
    that, as an implementor, I can deal with existing exceptions. I may not
    want to, and may choose not to if my subjective calculation of
    benefit/cost indicates it's not worthwhile. Given the existing pattern
    of code point assignments, I saw an efficient way to implement things.
      And, if future Unicode versions retain this pattern, neither I nor my
    successors will have to change our code to move to that new version.
    Changing code takes a significant amount of time and effort. Keeping
    new versions of Unicode using the same paradigms as previous versions
    means that implementations of those new versions will be available
    sooner than otherwise, and even that they get adopted at all. I was
    unaware of the subtleties in Han and Arabic, but those can be handled as
    exceptions, but making new exceptions is really contrary to Unicode's
    interests. So it really isn't about current counter examples; there's
    nothing much that can be done about them. It's about adopting
    guidelines to keep from unnecessarily creating new exceptions.



    This archive was generated by hypermail 2.1.5 : Thu Jul 29 2010 - 17:05:36 CDT