Re: Re[2]: Four Punctuation Symbols

From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Jan 31 2005 - 19:41:32 CST

Previous message: Jon Hanna: "RE: Surrogate points"
Maybe in reply to: Alexander Savenkov: "Re[2]: Four Punctuation Symbols"
Next in thread: Jon Hanna: "RE: Four Punctuation Symbols"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Alexander continued:

> >> could anyone tell me what are the codepoints for "?..",
>
> > <003F, 002E, 002E>
>
> ...
>
> I wonder what could happen if I asked about the codepoint for
> horizontal ellipsis. Some people just feel the *need* to post some
> sarcastic and, what is more important, unhelpful comments.

You can take it as sarcastic, if you like, but it was, in fact,
accurate. There is no encoded character for "?.." taken as a
unit.

> What is the reason to use U+2026 for ellipsis then?

For compatibility with existing legacy character encodings
that encode it as a unit, primarily.

>
> > The compatibility doubled punctuation forms are *attested* in
> > East Asian typographic practice, turned vertically:
>
> > W
> > O
> > W
> > !!
>
> > In such contexts, they are being treated as units, and it is easier
> > to map them in East Asian fonts and map between East Asian character
> > sets and Unicode with the compatibility characters encoded in
> > the standard.
>
> I see. So, they're present for compatibility reasons solely, and
> should be encoded as combinations of U+003f and U+0021 in non-vertical
> context?

Yes.

> > "?..", "!..", "!!!", and "???" have no particular status as
> > characters per se. Having them encoded as single character units
> > would simply create entry, processing, and equivalencing
> > difficulties for them, when there is no problem whatsoever
> > in simply dealing with them now as sequences of characters.
>
> I wish I could agree with you, but from what I've seen they are
> treated as solid characters.
^^^^^^^^^^

As text elements, perhaps, but as characters, no.

> For example, the two dots after the
> question mark use different kerning, as if it was an ellipsis with a
> question mark placed above the first dot.

Text rendering systems sophisticated enough to be concerned
about kerning of this type can kern sequences of U+002E just
as effectively as they could deal with preformed encodings of
punctuation sequences as a single character. In fact, you are
probably better off dealing with the marks separately.

--Ken

>
> I'm aware of the problems that accompany the encoding of any new
> character. They're not used as an excuse every time a new character
> is proposed. Please comment on.
>
> Alexander
> --
> Alexander Savenkov http://www.xmlhack.ru/
> savenkov@xmlhack.ru http://www.xmlhack.ru/authors/croll/
>
>

Previous message: Jon Hanna: "RE: Surrogate points"
Maybe in reply to: Alexander Savenkov: "Re[2]: Four Punctuation Symbols"
Next in thread: Jon Hanna: "RE: Four Punctuation Symbols"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Jan 31 2005 - 19:43:16 CST