Re[2]: Four Punctuation Symbols

From: Alexander Savenkov (savenkov@xmlhack.ru)
Date: Sun Jan 23 2005 - 09:12:57 CST

Next message: Doug Ewell: "Re: Subject: Re: 32'nd bit & UTF-8"

Previous message: Lokesh Joshi: "Need help for Arabic text processing"
In reply to: Kenneth Whistler: "Re: Four Punctuation Symbols"
Next in thread: Kenneth Whistler: "Re: Re[2]: Four Punctuation Symbols"
Maybe reply: Kenneth Whistler: "Re: Re[2]: Four Punctuation Symbols"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Hello,

on 2005-01-15T03:22:36+03:00 Kenneth Whistler <kenw@sybase.com> wrote:

> Alexander asked:

>> could anyone tell me what are the codepoints for "?..",

> <003F, 002E, 002E>

...

I wonder what could happen if I asked about the codepoint for
horizontal ellipsis. Some people just feel the *need* to post some
sarcastic and, what is more important, unhelpful comments.

>> I can't find them in the charts and since things like "!!",
>> "??", "?!", "!?" are encoded I expected those punctuation symbols
>> to be present as well.
>>
>> Using U+003f, U+2025 and U+0021, U+2025 sequences (EXCLAMATION MARK,
>> TWO DOT LEADER and QUESTION MARK, TWO DOT LEADER) along with triple
>> exclamation or question marks seems to be excessive.

> Well, there is no reason to use U+2025 for these, now is there?..

What is the reason to use U+2026 for ellipsis then?

...

> The compatibility doubled punctuation forms are *attested* in
> East Asian typographic practice, turned vertically:

> W
> O
> W
> !!

> In such contexts, they are being treated as units, and it is easier
> to map them in East Asian fonts and map between East Asian character
> sets and Unicode with the compatibility characters encoded in
> the standard.

I see. So, they're present for compatibility reasons solely, and
should be encoded as combinations of U+003f and U+0021 in non-vertical
context?

> There is no such requirement for arbitrary combinations of
> ASCII punctuation, which users may multiply at will!!!!!!!...

I wouldn't bother to post "arbitrary combinations of ASCII
punctuation". These character combinations are in wide use (unlike
your "!!!!!!!...").

> "?..", "!..", "!!!", and "???" have no particular status as
> characters per se. Having them encoded as single character units
> would simply create entry, processing, and equivalencing
> difficulties for them, when there is no problem whatsoever
> in simply dealing with them now as sequences of characters.

I wish I could agree with you, but from what I've seen they are
treated as solid characters. For example, the two dots after the
question mark use different kerning, as if it was an ellipsis with a
question mark placed above the first dot.

I'm aware of the problems that accompany the encoding of any new
character. They're not used as an excuse every time a new character
is proposed. Please comment on.

Alexander

-- 
  Alexander Savenkov                            http://www.xmlhack.ru/
  savenkov@xmlhack.ru             http://www.xmlhack.ru/authors/croll/

Next message: Doug Ewell: "Re: Subject: Re: 32'nd bit & UTF-8"
Previous message: Lokesh Joshi: "Need help for Arabic text processing"
In reply to: Kenneth Whistler: "Re: Four Punctuation Symbols"
Next in thread: Kenneth Whistler: "Re: Re[2]: Four Punctuation Symbols"
Maybe reply: Kenneth Whistler: "Re: Re[2]: Four Punctuation Symbols"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Jan 24 2005 - 11:01:18 CST