Re: Question about formatting numerals

From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Thu Sep 21 2006 - 13:04:12 CDT

  • Next message: Richard Wordingham: "Re: Unicode & space in programming & l10n"

    On Thu, 21 Sep 2006, Addison Phillips wrote:

    > Jukka K. Korpela wrote:
    - -
    >> but I'm pretty sure that actual data almost universally contains just
    >> normal spaces.
    >
    > That's probably not true. User input may be "regular spaces", but I think
    > you'll find that computer systems generate non-breaking spaces.

    Some systems may, but I don't think that's common at all. Think about all
    the texts written using text editors or word processors, by people who
    rarely even know about the no-break space, still less use it regularly.
    Their programs hardly convert spaces to no-break spaces. Numeric data
    written in text format by programs tends to use I/O routines that use no
    thousands separator, though they might sometimes use a period or a comma
    or even a space. But hardly a no-break space.

    > However, here we are dealing with a
    > recommendation to content authors. For a number, using a non-breaking space
    > will prevent things like line-breaking from interfering with text legibility.

    It will, but especially in justified text, it has a price. Besides, for a
    number, it would be rather trivial for a rendering engine to avoid (by
    default) a line break between sequences of digits even when they are
    separated by a space. (Actually, should this be taken into account in
    Unicode line breaking rules, by adding NU SP* × NU or at least NU SP × NU
    there? Just a thought.)

    >> I wouldn't be so worried about conversions to legacy encodings when using
    >> Unicode for new data.
    >
    > I would, simply because users will wish to utilize text in many places that
    > use legacy encodings. It is bad to have your number suddenly and inexplicably
    > become "123?445?789".

    You have a very good point here, but I don't think it's about legacy
    encodings. Rather, it's about more limited character repertoires and about
    legacy software. If you cut and paste numbers from, say, a text document
    into a spreadsheet program, you may find out that fixed-width spaces won't
    be recognized as spaces at all - even if no encoding problems are
    involved. But on similar grounds, you may run into problems with no-break
    spaces, too. Legacy software with simple ASCII-oriented input routines may
    get wild when it sees a no-break space.

    -- 
    Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
    


    This archive was generated by hypermail 2.1.5 : Thu Sep 21 2006 - 13:10:40 CDT