Re: U+2018 is not RIGHT HIGH 6 from Asmus Freytag on 2012-05-03 (Unicode Mail List Archive)

From: Asmus Freytag <asmusf_at_ix.netcom.com>
Date: Thu, 03 May 2012 01:03:27 -0700

Sometimes you are not free to choose what you would like.

One thing that's off the table is a new character code.

The reason for that categorical statement is that there is too much data
and software out that uses the existing character codes. Throwing a new
character into the mix will just create confusion. Text that should be
identical would acquire two alternate representations depending on
whether the new or the old character is used. That's not good.
Especially not for a situation that, while not ideal, has been tolerated
by tens of millions of users for decades - which means it's not one of
life's most urgent crises.

Sometimes, even when you are creating a "new" character encoding, you
are not actually free in your design of it. That happened to Unicode.
For characters that were in use (especially widespread use) at the time
Unicode was created, it was practically impossible to re-analyze them
based on some "ideal" precepts.

Where this was attempted, reality caught up with Unicode rather swiftly.
You can see traces of this in the naming of the quotation mark.
Unicode's principle had been the "semantic" encoding of characters, so
the distinction was made based on the presumed function or positioning
(opening or closing, left or right adjacent to the text).

At the same time, the actual set of these characters was based on
various other, then existing character sets and collections. Although at
the time many standard character sets were still limited to straight
quotes, some did have the curly ones, including sets used for
typesetting systems.

The fact that the use of quotation marks is so radically language
dependent was not understood from the beginning, otherwise they would
have been named not by function but by something else. The use of
Times-like roman font for the representative glyph further obscured the
glyph issue that you are trying to bring to our attention here.

Taking all of this together meant that Unicode took on board the
characters as they were then defined in legacy sets and half-heartedly
named them by function ignoring that the function wasn't constant across
languages.

In any case, trying to approach this from the semantic position has
issues. In Swedish, for example, you use the same quotation mark symbol
for both opening and closing. It would be more than bizarre to use two
different characters for that purpose.

So that defines the characters a bit more by appearance than semantics.
On the other hand, you are pointing out that some uses allow a wider
range of glyphic variation of the existing characters than other usages.

This is something that should be documented, but in terms of helping
font designers provide the correct glyphs for each context. The time to
create a special character code for German quotation marks is passed.
The moment for that would have been the late 70s.

A./
Received on Thu May 03 2012 - 03:06:37 CDT

This archive was generated by hypermail 2.2.0 : Thu May 03 2012 - 03:06:38 CDT