RE: ISO 10646 & GB18030 repetoire

From: Mike Ayers (mike.ayers@tumbleweed.com)
Date: Fri Jan 07 2005 - 13:05:02 CST

  • Next message: Christopher Fynn: "Re: ISO 10646 & GB18030 repertoire"

    > From: unicode-bounce@unicode.org
    > [mailto:unicode-bounce@unicode.org] On Behalf Of Arcane Jill
    > Sent: Friday, January 07, 2005 1:44 AM

    > All sounds a bit pedantic to me.

            A bit?

    > Surely /no/ applications
    > "represent" LATIN SMALL LETTER A WITH ACUTE as
    > U+00E1, if by "represent" you mean export the representation to the
    > U+outside
      ^
      |
      spurious, yes?

    > world. (The internal representation of Unicode characters
    > within an application is private and opaque, and sometimes
    > not even known to the programmer if they use a library which
    > abstracts the concept).

            Correct.

    > Instead, Unicode defines LATIN SMALL LETTER A WITH ACUTE as
    > U+00E1, and applications export U+00E1 as either <0xC3 0x91>
    > (UTF-8), <0x00E1> (UTF-16), <0x000000E1> (UTF-32) ... or
    > <0xA8 0xA2> (GB18030). In which case, surely GB18030 is an
    > encoding form of Unicode, just like the UTFs.
    >
    > No?

            No. First, although applications may export UTFs, if they are using
    those UTFs to carry Unicode character codes, then they are in fact
    *representing* LATIN SMALL LETTER A WITH ACUTE as U+00e1, even though the
    codepoints may not be obvious by looking at the byte stream (think of
    encryption, which does not change text, but transforms it into a highly
    nontrivial encoding form. The "U+" notation refers to the codepoints, not
    the bytes used to transmit them.

            Second, and more importantly, since GB18030 does not encode all of
    Unicode, it cannot be considered a Unicode encoding form.

    /|/|ike

    And now a message from people who like to make me look stupid:

    "Tumbleweed E-mail Firewall <tumbleweed.com>" made the following
     annotations on 01/07/05 11:07:59
    ------------------------------------------------------------------------------
    This e-mail, including attachments, may include confidential and/or proprietary information, and may be used only by the person or entity to which it is addressed. If the reader of this e-mail is not the intended recipient or his or her authorized agent, the reader is hereby notified that any dissemination, distribution or copying of this e-mail is prohibited. If you have received this e-mail in error, please notify the sender by replying to this message and delete this e-mail immediately.
    ==============================================================================



    This archive was generated by hypermail 2.1.5 : Fri Jan 07 2005 - 13:16:43 CST