Re: Saudi-Arabian Copyright sign

From: Doug Ewell (
Date: Tue Sep 21 2004 - 00:55:48 CDT

  • Next message: Philippe Verdy: "Re: Unicode & Shorthand?"

    Jörg Knappen <knappen at uni dash mainz dot de> wrote:

    > I see a precedent in Unicode to treat Copyright-like sign differently
    > from simple encircled letters:
    > Unicode takes precautions not to encode the same character twice.
    > Therefore, superscript digits 2 and 3 are absent from the superscript
    > block U+2070 ff.
    > However, the Block eclosed alphanumerics U+2460 ff includes encircled
    > capital latin letters C, P, and R in addition to the copyright-like
    > sing elsewhere.

    OK, I guess I need some guidance from the Unicode elder statesmen and
    greater experts.

    I have been under the impression all along that what Jörg calls
    "copyright-like signs," meaning U+00A9 and U+00AE and U+2117 and
    possibly others, are encoded are separate entities primarily because
    they were in pre-existing legacy character sets. Remember that a major
    goal of Unicode at its inception was to make sure all such character
    sets were covered.

    Obviously U+00A9 and U+00AE were in ISO 8859-1, at those same code
    points. They also appeared in MS-DOS code page 850, which also predated
    Unicode. I don't know if U+2117 was in any existing standards; I just
    know it's in my Unicode 1.0 book.

    Jörg's comments imply that these symbols are in Unicode because of a
    policy or "precedent" for treating such symbols specially, not (or not
    only) because of the policy of encoding whatever was in the legacy
    character sets of the time.

    Let's suppose we were back in the mid-'90s, and for whatever reason, the
    circled Latin letters in the U+24xx block were already encoded but the
    three "copyright-like signs" were not. Suppose they weren't in any
    legacy character sets either. (Use your imagination.)

    Now suppose someone proposed that the circled-C copyright symbol
    (picking the most widely used example) be encoded as a separate entity.
    Suppose further that someone else pointed out that it could be
    represented by one of the circled Latin letters in the U+24xx zone (Ⓒ or
    ⓒ), and a debate ensued over whether those letters were of the correct

    Finally, let's suppose that someone else suggested using the combination
    U+0043 (or U+0063) plus U+20DD, the combining enclosing circle, and that
    we then had a debate over whether fonts and rendering engines were up to
    the task.

    What would UTC and WG2 do? Would they choose to encode COPYRIGHT SIGN
    on its own, recommend the existing circled Latin letters, or recommend
    the combining sequence? Why? (Use a separate sheet of paper if

    -Doug Ewell
     Fullerton, California

    This archive was generated by hypermail 2.1.5 : Tue Sep 21 2004 - 00:58:54 CDT