Re: Every character code in the world

From: David Starner (starner@okstate.edu)
Date: Fri Nov 15 2002 - 13:29:11 EST

  • Next message: Dean Snyder: "Re: mixed-script writing systems"

    On Fri, Nov 15, 2002 at 01:11:39PM -0500, John Cowan wrote:
    > David Starner scripsit:
    >
    > > Have you looked at the way Emacs 21 handles this? It's got something
    > > similar going on.
    >
    > I confess I remain in blissful ignorance of Emacs and all its works. Do
    > you have a pointer to this particular part of it?

    It's not as extensive as I remembered, and this is pretty old:

    Emacs-Unicode-990824
    ----------------------------------------------------------------------
    Internal Character code:

      00 0000 xxxxxxxx xxxxxxxx Unicode U+0000 - U+FFFF
      00 xxxx xxxxxxxx xxxxxxxx Unicode 20bit (via surrogate pair)
      01 0000 xxxxxxxx xxxxxxxx Unicode 20bit (via surrogate pair)
      01 0ppp xxxxxxxx xxxxxxxx 7 64kByte planes reserved for Emacs
      01 1ppp xxxxxxxx xxxxxxxx 8 64kByte planes for private use
      1x xxxx xxxxxxxx xxxxxxxx for private use, CNS 3-16, and CCCII

            Private area is 180000h - 3087FFh

    ----------------------------------------------------------------------
    Multibyte sequence in buffer/string:

      1 byte: xxxxxxxx
        0xxxxxxx
            ASCII
        1xxxxxxx
            not used

      2 bytes: 110xxxxx 10xxxxxx where x... are:
        00000 000000 - 00001 111111 (0h - 7Fh)
            7 bits not used
            (or we may be able to use this area for holding 8-bit raw data
             in multibyte buffer/string)
        00010 000000 - 11111 111111 (80h - 7FFh)
            Unicode U+0080 - U+07FF

      3 bytes: 1110xxxx 10xxxxxx 10xxxxxx where x... are:
        0000 000000 000000 - 0000 011111 111111 (0h - 7FFh)
            11 bits not used
        0000 100000 000000 - 1111 111111 111111 (800h - FFFFh)
            Unicode U+0800 - U+FFFF

      4 bytes: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx where x... are:
        000 000000 000000 000000 - 000 001111 111111 111111 (0h - FFFFh)
            16 bits not used
        000 010000 000000 000000 - 100 001111 111111 111111 (10000h - 10FFFFh)
            20 bits Unicode via surrogate pare
        100 010000 000000 000000 - 101 111111 111111 111111 (110000h - 17FFFFh)
            7 64kByte planes reserved for Emacs
            We may map Japanese Han characters here.
        110 000000 000000 000000 - 111 111111 111111 111111 (180000h - 1FFFFFh)
            8 64kByte planes reserved for private use

      5 bytes: 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx where x... are:
        00 000000 000000 000000 000000 - 00 000111 111111 111111 111111
                                    0h - 1FFFFFh
            21 bits not used
        00 001000 000000 000000 000000 - 00 001100 001000 011111 111111
                               200000h - 3087FFh
            1083391 (almost 1M) character code points for private use
        00 001100 001000 100000 000000 - 00 001100 100111 111111 111111
                               308800h - 327FFFh
            CNS Plain 3 to 16 (96*96*14)
        00 001100 101000 000000 000000 - 00 001111 111111 111111 111111
                               328000h - 3FFFFFFh
            CCCII (96*96*96)

    -- 
    David Starner - starner@okstate.edu
    Great is the battle-god, great, and his kingdom--
    A field where a thousand corpses lie. 
      -- Stephen Crane, "War is Kind"
    


    This archive was generated by hypermail 2.1.5 : Fri Nov 15 2002 - 15:16:59 EST