RE: Hexadecimal digits?

From: Jill Ramonsky (Jill.Ramonsky@Aculab.com)
Date: Tue Nov 11 2003 - 10:11:25 EST

  • Next message: jameskass@att.net: "Re: Hexadecimal digits?"

    Lots of useful and sensible opinions to which to reply, quoted below.
    I'll try to reply to all of them at once.

    In summary then, suggestions which seem to cause considerably less
    objection than the Ricardo Cancho Niemietz proposal are:
    (1) Invent a new DIGIT COMBINING LIGATURE character, which allows you to
    construct any digit short of infinity
    (2) Use ZWJ for the same purpose
    (3) Invent two new characters BEGIN NUMERIC and END NUMERIC which force
    reinterpretation of intervening letters as digits

    I infer some confusion among contributors to this thread, some of whom
    are still talking to me as though I'm only interested in a sort
    algorithm and nothing else. I thought I'd made it clear that that was
    merely an insignificant example of a more general overall concept, so
    I'm going to ignore as irrelevant any suggestions as to how to make a
    sort work, and focus instead on how to make digits >9 work.

    To address Peter's question, "why not just use ZWJ"?, the answer is
    partly ignorance, and partly concern over how a high-digit-unaware
    renderer would handle things. It would of course be COMPLETELY
    DISASTEROUS if the hex string "2F" were to be (correctly, in this
    scheme) represented as ('2' + '1' + ZWJ + '5') and then rendered as
    "215" by an unaware renderer. I would also be concerned about ambiguity.
    I'd want the combined character to be unambiguously a single digit with
    a computable value. Ignorance came into play also because I just didn't
    realise you could do that with ZWJ, and I'm not convinced that ('1' +
    ZWJ + '5') would be universally understood as the hex digit we normally
    write as F. I guess I see the option of DIGIT COMBINING LIGATURE as
    maybe a bit like FRACTION SLASH, in that it makes /clear/ that the thing
    you are composing is a number (a digit, in the case of DIGIT COMBINING
    LIGATURE, and a fraction in the case of FRACTION SLASH). The existence
    of DIGIT COMBINING LIGATURE would also give us a place in the code
    charts where its exact usage algorithm could be specified. For all of
    these reasons, I don't think that ZWJ fits the bill, though I'd be happy
    to be convinced otherwise if my reasoning is flawed.

    The option of BEGIN NUMERIC and END NUMERIC is also a pretty good one,
    and has the staggering backward compatibility property that if the hex
    string "2F" were to be (correctly, in this scheme) represented as (BEGIN
    NUMERIC + '2' + 'F' + END NUMERIC) it would be rendered as "2F" by an
    unaware renderer, which is of course, perfect. It does have the
    /dis/advantage, however, that there appears to be no way to specify in
    the existing code charts what the numeric value of a given letter ought
    to be. For example, how should a hex-aware interpretter interpret (BEGIN
    NUMERIC + 'j' + END NUMERIC)? This is still a good option, of course,
    but it would need to supplemented by an additional code chart. This is
    because everything between BEGIN NUMERIC and END NUMERIC would have
    different properties. However, there is another reason why I don't think
    this is the best solution - it's not stateless. From a random point in a
    string, you'd have to parse backwards and forwards to figure out how to
    interpret everything. It also creates problems for concatenation and
    substringing. What's more, it perpetuates the appallingly monstrous meme
    that the /case/ of hex "2F" is somehow important, when in fact we should
    be clear that all digits are caseless, and that the /apparent/ case of
    digits ten to fifteen is merely an artifact.

    Finally, there's Mark's observation that there may be some legitimate
    use for digits >15.

    For all of these reasons, my preference is for DIGIT COMBINING LIGATURE.

    So it would seem I now have the choice of either contacting Ricardo and
    suggesting this alternative to him, or arguing against him and then
    submitting a counter-proposal. I don't know which approach is likely to
    be most productive.

    Jill

    > -----Original Message-----
    > From: Philippe Verdy [mailto:verdy_p@wanadoo.fr]

    > Another solution could be a formatting control that overrides the
    > interpretation of a sequence of characters as digits rather
    > than as letters

    > Here I just suggested a few things for your problem of natural sort or
    > semantic analysis, but I don't need it and I won't defend
    > this idea. It's up
    > to you to defned your opinion and make an alternate proposal for WG2.
    > Clearly you take your distance from the other very
    > problematic proposal to
    > encode figure-width letters...

    > -----Original Message-----
    > From: Peter Kirk [mailto:peterkirk@qaya.org]

    > So, Jill, could you get much of what you want by encoding your hex
    > digits as ligatures between regular digits, e.g. <U+0031, ZWJ,
    > U+0030...0035>? They would have the properties of digits, and
    > could be
    > tailored for collation, as contractions, where you need them. I'm not
    > sure why you suggest a special DIGIT COMBINING LIGATURE, why not just
    > use ZWJ?

    > -----Original Message-----
    > From: Mark E. Shoulson [mailto:mark@kli.org]

    > If/when Tengwar gets coded, it will have digits for 10 and 11, as it
    > uses base-12.
    > I would say that to the extent that all this is a
    > good idea, we
    > shouldn't code lots of different ones (A,B for the computer
    > crowd, X,E
    > for the Dozenal crowd); let glyph-variants handle it.
    > (as an oddball addition: if the maximum base we're really trying to
    > support is 16, it might be handy to have a "16" digit as well,
    >
    > ~mark



    This archive was generated by hypermail 2.1.5 : Tue Nov 11 2003 - 11:14:20 EST