Re: Character proposal: SUBSCRIPT TEN

From: Leo Broukhis (leob@mailcom.com)
Date: Wed Jan 16 2008 - 18:41:57 CST

  • Next message: Mark Davis: "Re: Character proposal: SUBSCRIPT TEN"

    On Jan 16, 2008 1:42 PM, Kenneth Whistler <kenw@sybase.com> wrote:

    > GOST 10859 and ALCOR were effectively dead encodings long before
    > Unicode even got started collecting repertoire,

    It might seem funny, but I've heard of operational BESM-6 machines
    (that used the GOST encoding)
    somewhere in Russia as recently as last year on some military
    installation - where it's easier to keep paying for
    maintenance, electricity and cooling rather have a headache upgrading
    the system.

    > > It cannot be replaced by SUBSCRIPT ONE + SUBSCRIPT ZERO, because it
    > > has to occupy one character position for the sake of text aligned for
    > > a fixed-width font.
    >
    > That's debatable. For transcoding obscure character encodings,
    > there really is no requirement that you have one-to-one
    > mappings for every character. You can certainly represent
    > the subscript 10 in GOST 10859 with <2081, 2080> in Unicode
    > and convert it back losslessly with no problem.

    Lossless conversion is fine, but I'm interested in a portable exact
    representation of a GOST printout.
    I would not object to a rich text approach if there was a way to do
    it, e.g. if something like
    <halfwidth>&#x2081;&#x2080;</halfwidth> existed and could do the job.

    > > What should an emulator of a computer that used GOST 10859 or ALCOR
    > > produce, then?
    >
    > For an emulator you would have various options, including
    > mapping of the sequence <2081, 2080> to your fixed-width
    > ACPU-128 drum printer font glyph for a subscript 10. Or,
    > if your emulator is making one-to-one character to glyph
    > assumptions, then you use a PUA value to stand in for the
    > sequence, and map *that* to your fixed-width glyph.

    Correct me if I'm wrong, but AFAIK the ways to attach private glyphs
    to network documents are not standardized nor widely supported yet.

    > sponsorship is not required to simply add one more symbol
    > for compatibility with an old encoding to the standard.

    That is good to know. I've looked at the submission page before
    joining the list;
    I'm following the suggestion to discuss proposals first.

    > However, justification in terms of emulation of long unused
    > character sets and computing machinery isn't a very strong
    > case, since emulation software is *software*, after all, and
    > always has plenty of options to deal with such problems
    > creatively, as long as all the component pieces needed for
    > character representation are present in Unicode.

    Typesetting software has too, but that did not seem to stop people
    from requesting and acquiring separate codepoints for monospaced
    letters and digits
    (U+1D670 - U+1D6A3, U+1D7F6 - U+1D7FF).

    If we're to follow the spirit of UTN28, we should add a mathematical decimal
    exponent base character at least to allow for the unambiguous
    scientific representation of reals
    in math texts. What does 1.5e+3 without a U+2062 (invisible times)
    before 'e' really mean? 1500 or 7.077?

    Subscripts after numbers already have a different meaning to indicate
    the base of the numeral system.

    Does it look more convincing now?



    This archive was generated by hypermail 2.1.5 : Wed Jan 16 2008 - 18:43:57 CST