From: Doug Ewell (email@example.com)
Date: Tue Sep 21 2004 - 00:55:48 CDT
Jörg Knappen <knappen at uni dash mainz dot de> wrote:
> I see a precedent in Unicode to treat Copyright-like sign differently
> from simple encircled letters:
> Unicode takes precautions not to encode the same character twice.
> Therefore, superscript digits 2 and 3 are absent from the superscript
> block U+2070 ff.
> However, the Block eclosed alphanumerics U+2460 ff includes encircled
> capital latin letters C, P, and R in addition to the copyright-like
> sing elsewhere.
OK, I guess I need some guidance from the Unicode elder statesmen and
I have been under the impression all along that what Jörg calls
"copyright-like signs," meaning U+00A9 and U+00AE and U+2117 and
possibly others, are encoded are separate entities primarily because
they were in pre-existing legacy character sets. Remember that a major
goal of Unicode at its inception was to make sure all such character
sets were covered.
Obviously U+00A9 and U+00AE were in ISO 8859-1, at those same code
points. They also appeared in MS-DOS code page 850, which also predated
Unicode. I don't know if U+2117 was in any existing standards; I just
know it's in my Unicode 1.0 book.
Jörg's comments imply that these symbols are in Unicode because of a
policy or "precedent" for treating such symbols specially, not (or not
only) because of the policy of encoding whatever was in the legacy
character sets of the time.
Let's suppose we were back in the mid-'90s, and for whatever reason, the
circled Latin letters in the U+24xx block were already encoded but the
three "copyright-like signs" were not. Suppose they weren't in any
legacy character sets either. (Use your imagination.)
Now suppose someone proposed that the circled-C copyright symbol
(picking the most widely used example) be encoded as a separate entity.
Suppose further that someone else pointed out that it could be
represented by one of the circled Latin letters in the U+24xx zone (Ⓒ or
ⓒ), and a debate ensued over whether those letters were of the correct
Finally, let's suppose that someone else suggested using the combination
U+0043 (or U+0063) plus U+20DD, the combining enclosing circle, and that
we then had a debate over whether fonts and rendering engines were up to
What would UTC and WG2 do? Would they choose to encode COPYRIGHT SIGN
on its own, recommend the existing circled Latin letters, or recommend
the combining sequence? Why? (Use a separate sheet of paper if
This archive was generated by hypermail 2.1.5 : Tue Sep 21 2004 - 00:58:54 CDT