RE: Hexadecimal digits?

From: Kenneth Whistler (kenw@sybase.com)
Date: Tue Nov 11 2003 - 14:55:43 EST

  • Next message: Chris Jacobs: "Re: Tengwar digits (was: Hexadecimal digits?)"

    Jill Ramonsky summarized:

    > In summary then, suggestions which seem to cause considerably less
    > objection than the Ricardo Cancho Niemietz proposal are:
    > (1) Invent a new DIGIT COMBINING LIGATURE character, which allows you to
    > construct any digit short of infinity
    > (2) Use ZWJ for the same purpose
    > (3) Invent two new characters BEGIN NUMERIC and END NUMERIC which force
    > reinterpretation of intervening letters as digits

    Actually, I don't think these cause considerably less objection.
    They are simply suggestions which Philippe has made, and which
    haven't been potshot sufficiently yet on the list.

    I note that Philippe and you *have* reached consensus that you are
    talking about extending the list of digits, and are not concerned
    about Ricardo Cancho Niemietz's issue of fixed-width digit display.

    > I'm going to ignore as irrelevant any suggestions as to how to make a
    > sort work, and focus instead on how to make digits >9 work.

    O.k. And I won't veer into any of the sorting issues, either.

    > For all of
    > these reasons, I don't think that ZWJ fits the bill, though I'd be happy
    > to be convinced otherwise if my reasoning is flawed.

    I don't think your reasoning is flawed here at all. The ZWJ is a
    cursive control and ligation control. Its effect, if any, should
    be on the *appearance* of neighboring characters. So if someone
    decided, for example, that the "00" in 2003 looked better ligated
    and wanted to create a font to do so, they could hint text with
    a ZWJ to indicate when the sequence "00" should ligate into a
    single glyph and when not. You can't expect generic support for that
    kind of visual ligation to morph, on all the system platforms, into
    a completely orthogonal concept of treating ligated digit sequences
    as digits in their own right.

    >
    > The option of BEGIN NUMERIC and END NUMERIC is also a pretty good one,
    > ... However, there is another reason why I don't think
    > this is the best solution - it's not stateless. From a random point in a
    > string, you'd have to parse backwards and forwards to figure out how to
    > interpret everything. ...

    I also concur with this argument. Creating new stateful controls
    for this is a non-starter. If people want stateful sequence-spanning
    attribute designations like this, they should accomplish it in XML
    or something similar, which has this kind of apparatus built in.

    > For all of these reasons, my preference is for DIGIT COMBINING LIGATURE.

    This option fails for some of the same reasons as the use of ZWJ. It
    doesn't have the problem of being a misapplication of an existing
    format control character, so that it would be semantically clear.
    But it has the same rendering issues. To quote your analysis for
    ZWJ, mutatis mutandis:

    > It would of course be COMPLETELY
    > DISASTEROUS if the hex string "2F" were to be (correctly, in this
    > scheme) represented as ('2' + '1' + DCL + '5') and then rendered as
    > "215" by an unaware renderer.

    ... which it would be.

    You could, of course, avoid this problem if the "DIGIT COMBINING LIGATURE"
    were actually just a visible symbol, rather than an invisible
    format control that would have dubious support in most platform
    software. For example, you could simply make use of an
    existing symbol and *define* it to be your {digit combining ligature}
    symbol. Thus, for "2F", you could have, e.g.:

                    21¤5
                    
    where ¤ is defined as a digit composition operator, defaulting to
    decimal digit composition. Thus:

                    21¤5<radix16> = 0x2F = 47
                    21¤5<radix36> = = 87
                    21¤5<radix97> = = 209
                    ...
                    
    And 21¤5<radix8> is an error, digit out of range.

    Note that to evaluate any such expression, you still need to know
    the radix implicitly (or explicitly), just as 777 = 777 if the
    radix is 10, but 0x777 = 1911 if the radix is 16.

    The mathematically inclined out there could probably generalize
    this scheme to allow any digit (composed or not) to be an operand
    of the digit composition operator, for greater generality. And in
    fact this seems such an obvious kind of approach to generalizing
    the concept of "digit" that I'd be surprised if there wasn't already
    a mathematical literature on the topic and some more or less
    accepted mathematical symbology to deal with this.

    Now the drawback of a mathematically defined approach to the problem
    is that you couldn't really expect systems software to automatically
    support digit formation and evaluation in such a scheme. But
    aren't we really talking about specialized applications here, anyway?
    I'm not hearing any groundswell of support here for a wonderful
    idea that all the platform and library vendors and language
    standardization committees have overlooked all these years in
    supporting hex. Instead, any such scheme for extending digits has
    to deal with the ground facts that hexadecimal *is* supported
    already in those "oceans of data" and "rivers of code" already
    mentioned earlier in the thread. That is is done by overloading
    the semantics of A-F and a-f may displease the purists out there,
    but it is still the case. Those oceans aren't going to dry up,
    and those rivers are not going to be suddenly diverted. So that
    leaves you once again in the position of advocating a specialized
    mathematical application of digit extension. And for that, I don't
    see any particular barrier to simply using existing characters to
    devise an appropriate symbolic convention for the generalized
    case, the way mathematicians have been behaving for centuries now.

    >
    > So it would seem I now have the choice of either contacting Ricardo and
    > suggesting this alternative to him, or arguing against him and then
    > submitting a counter-proposal. I don't know which approach is likely to
    > be most productive.

    As before, I don't see either approach as likely to gain any
    traction in the encoding committees, given the scope of the problem
    and the likelihood of complications if anything remotely like
    'A'..'F' got encoded again explicitly as hex digits.

    --Ken



    This archive was generated by hypermail 2.1.5 : Tue Nov 11 2003 - 15:42:09 EST