Re: New Public Review Issue

From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Tue Aug 02 2005 - 12:08:18 CDT

  • Next message: John Hudson: "Re: Jumping Cursor. Was: Right-to-Left Punctuation Problem"

    On Thu, 28 Jul 2005, Rick McGowan wrote:

    > 74 Change to Default Localization for NaN in CLDR
    >
    > There has been a request to change the default localization for a NaN from
    > the character U+FFFD REPLACEMENT CHARACTER to another representation. The
    > NaN floating-point value means "Not a Number", and represents an undefined
    > result of a mathematical operation.

    Maybe we can discuss this issue on this list preliminary, to avoid missing
    something obvious. I think the key question whether the value of NaN is a
    single character, as currently defined in the prose of the LDML
    specification:

    "NaN is represented as a single character, typically (\uFFFD). This
    character is determined by the localized number symbols."

    ( http://www.unicode.org/reports/tr35/#Number_Format_Patterns
    under the heading "Special values")

    I can see reasons for requiring that the general, culturally neutral
    symbol for NaN be a single Unicode character (though we really haven't got
    a suitable character for it now). I can even see reasons for using the
    symbol that has been used in Java. But shouldn't _localization_ aim at
    allowing data to be rendered in a format that is understandable to people,
    without need for knowing special conventions and with the information
    presented in natural language known to each user, if possible, or at least
    using abbreviations that they are familiar with?

    Thus, it would seem logical to allow any string as the value of NaN and to
    assume that typical localized values are strings like "Not a Number" or
    "undefined result", in different languages. After this, issue 74 could be
    considered in a new context. (Keeping the current default value would be
    one option, and "NaN" might be another.)

    Am I missing something (obvious or non-obvious) here?

    I would expect that the value of NaN will mostly be used in
    localized output from numerical calculations and diagnostic messages.
    In diagnostic messages (assuming that program execution is, for some
    reason, aborted due to a computation producing NaN) I would expect the
    value of NaN appear more or less standalone, so it could be of any
    reasonable length. But is the one-character requirement based on an idea
    of filling a numeric output field of some prescribed width by a character,
    used in as many copies as needed for the fill? (Much like we used to see
    fields filled with **** in FORTRAN output.) For _such_ purposes, the NaN
    indicator would need to be single character. However, would it make sense
    to localize such data? (If the restriction is indeed based on such
    considerations and if localization is regarded as useful, I think the LDML
    specification should explain this, to make it easier to people to make
    reasonable proposals on what the value might be in some locale.)

    Similar considerations apply to infinity (with the exception that there is
    a widely known reasonable one-character default value for it; but people
    might still find a word, like "infinity", more widely understood).

    -- 
    Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
    


    This archive was generated by hypermail 2.1.5 : Tue Aug 02 2005 - 12:09:48 CDT