Re[2]: Four Punctuation Symbols

From: Alexander Savenkov (savenkov@xmlhack.ru)
Date: Sun Jan 23 2005 - 09:12:57 CST

  • Next message: Doug Ewell: "Re: Subject: Re: 32'nd bit & UTF-8"

    Hello,

    on 2005-01-15T03:22:36+03:00 Kenneth Whistler <kenw@sybase.com> wrote:

    > Alexander asked:

    >> could anyone tell me what are the codepoints for "?..",

    > <003F, 002E, 002E>

    ...

    I wonder what could happen if I asked about the codepoint for
    horizontal ellipsis. Some people just feel the *need* to post some
    sarcastic and, what is more important, unhelpful comments.

    >> I can't find them in the charts and since things like "!!",
    >> "??", "?!", "!?" are encoded I expected those punctuation symbols
    >> to be present as well.
    >>
    >> Using U+003f, U+2025 and U+0021, U+2025 sequences (EXCLAMATION MARK,
    >> TWO DOT LEADER and QUESTION MARK, TWO DOT LEADER) along with triple
    >> exclamation or question marks seems to be excessive.

    > Well, there is no reason to use U+2025 for these, now is there?..

    What is the reason to use U+2026 for ellipsis then?

    ...

    > The compatibility doubled punctuation forms are *attested* in
    > East Asian typographic practice, turned vertically:

    > W
    > O
    > W
    > !!

    > In such contexts, they are being treated as units, and it is easier
    > to map them in East Asian fonts and map between East Asian character
    > sets and Unicode with the compatibility characters encoded in
    > the standard.

    I see. So, they're present for compatibility reasons solely, and
    should be encoded as combinations of U+003f and U+0021 in non-vertical
    context?

    > There is no such requirement for arbitrary combinations of
    > ASCII punctuation, which users may multiply at will!!!!!!!...

    I wouldn't bother to post "arbitrary combinations of ASCII
    punctuation". These character combinations are in wide use (unlike
    your "!!!!!!!...").

    > "?..", "!..", "!!!", and "???" have no particular status as
    > characters per se. Having them encoded as single character units
    > would simply create entry, processing, and equivalencing
    > difficulties for them, when there is no problem whatsoever
    > in simply dealing with them now as sequences of characters.

    I wish I could agree with you, but from what I've seen they are
    treated as solid characters. For example, the two dots after the
    question mark use different kerning, as if it was an ellipsis with a
    question mark placed above the first dot.

    I'm aware of the problems that accompany the encoding of any new
    character. They're not used as an excuse every time a new character
    is proposed. Please comment on.

    Alexander

    -- 
      Alexander Savenkov                            http://www.xmlhack.ru/
      savenkov@xmlhack.ru             http://www.xmlhack.ru/authors/croll/
    


    This archive was generated by hypermail 2.1.5 : Mon Jan 24 2005 - 11:01:18 CST