Re: Code Point -- What is the integer?

From: Sivakatirswami (
Date: Thu Apr 28 2005 - 14:29:04 CST

  • Next message: Sivakatirswami: "Re: Code Point -- What is the integer?"

    OK, I've written this very, very tiny piece.. and it's gone up to our
    editors, so I don't know how much will stand but i would like you all
    to vet it for accuracy... I think I have "hedged" enough to cover the


    Caption under unicode log with some glyphs and code points.

    "A single list for every character in nearly all the world’s languages"

    Article key:



    The Future of Communication


    When Apple announced unicode Tamil support in its latest Mac OS X
    “Tiger” system, we cheered at Hinduism Today. But “What does it mean?”
    It’s incredibly complicated and yet very simple. Unicode is a series,
    1,114,111 points long, capable of holding characters from all
    languages. Each character has a unique “code point,” no two ever the
    same. For the future of computing the implications are awesome: the
    code point 2965 (U+OB95) will always and everywhere be the Tamil
    character ka, in any computer, any system, on any keyboard, in any
    software, on any email, even in China or Botswana. For mankind, what an
    incredible achievement—United Nations of the Mind! See


    that's it... that's all.. oh.. is "unicode" always capitalized?

    I got pCalc off the web, I hope that conversion is correct: U+0B95 =


    Sannyasin Sivakatirswami
    Himalayan Academy Publications
    at Kauai's Hindu Monastery,
    On Apr 28, 2005, at 8:48 AM, Kenneth Whistler wrote:

    >> Namaskar and Aloha from the offices of Himalayan Academy Publications
    >> in Hawaii...
    > Welcome to the Unicode list!
    >> 1. The code points described as a simple series of integers from
    >> 1 to 1,123,000 (or whatever that last integer is that is equivalent
    >> to:
    >> U+10FFFF)
    > The last decimal number is 1,114,111, FYI. (Easier to remember as
    > 1114111.)
    >> "Unicode is this just a long series from One to over One Million and
    >> there is a character in each place and the whole list includes all the
    >> characters of all the languages known to man, past and present."
    > Well, the project hasn't been finished. There are characters still
    > not in, like Egyptian hieroglyphics. But that is the essence of
    > the project, yes.
    >> 2. but then we move on to: " Unicode characters may be encoded at any
    >> code point from U+0000 to U+10FFFF" and now we begin to slide into the
    >> "nerd realm"
    > Frank provided a nice summary of the justification for why engineers
    > prefer hexadecimal representations.
    >> I understand "004F" to be the hexadecimal representation for four
    >> separate, 4-bit sequences.
    > Well, actually one 16-bit sequence: 0000000001001111
    > But for readability, that is often broken up into a sequence of
    > 4-bit sequences called "nibbles": 0000 0000 0100 1111
    >> for purposes of a diagram, I would like to translate any given such
    >> code point designation like A = U+0041 to its integer position in the
    >> series. (aside question: what do you call that kind of "label" for
    >> the
    >> code point: "U+****"?)
    > The Unicode Standard just calls it the "code point".
    > The ISO/IEC 10646 international standard calls it the "short identifier
    > for code positions".
    > The two things mean the same.
    >> e.g. expressed verbally, if one were writing an article for "mom and
    >> pop"
    >> The capital letter A is number "65" in the series... but computer
    >> geeks like to express it in hexidecimal form like this, "U+0041" and
    >> if
    > ^^^^^^^^^^^
    > hexadecimal (often misspelled ;-) )
    >> you really need to describe it to the computer then it is "0000 0000
    >> 0100 0001"
    >> or in a diagram simply
    >> A --> 65 --> U+0041 --> 0000 0000 0100 0001
    >> And ditto for one Tamil Char and one Chinese character... but my
    >> problem is ascertaining the second, simple integer, segement...
    >> OK, so my questions are:
    >> 1) is the decimal expression for the capital letter A as 65 exactly
    >> correspondent to its integer code point position in the total unicode
    >> series expressed as as a series of integers?
    > Yes.
    >> 2) How can one ascertain the integer number for a code point
    >> outside-above base ANSI?
    > The easiest way is to make use of the calculators that are
    > available as desk accessories on almost any computer. (Windows,
    > Mac, Solaris, Linux, etc., all have one.)
    > On Windows: Programs > Accessories > Calculator
    > Set it to "Scientific". Choose "Hex". Type in the hexadecimal
    > number (e.g. "BE6"). Hit the "Dec" button, and presto, it
    > changes to "3046", which is the decimal number equivalent of
    > hexadecimal 0x0BE6.
    >> So I we want to be able to say, for the layman:
    >> "The entire Tamil alphabet is contained between characters 2560 and
    >> 2843 in the unicode series" But one need sto
    > The block for Tamil is U+0B80..U+0BFF. So if you convert those
    > numbers to decimal, the range is: 2944..3071.
    >> a) be able find where those blocks are (where do you go to find the
    >> blocks beginning and endings for different languages)
    > Go to the chart pages that other respondents already pointed
    > you to.
    >> b) be able to translate "U+0BE6" (which is a position in the Tamil
    >> set)
    >> back to a simple integer in the series. If I just "do the math*
    >> using
    >> the same correlation for the Letter A ["0041" = "65"therefore 0BE6
    >> must
    >> equal **** ] ... will it be correct?
    > Yes. And U+0BE6 --> decimal 3046.
    >> I'm hoping I can go somewhere to find this info easily from some
    >> tables....
    > Just use the calculator accessories. It is easy.
    > --Ken

    This archive was generated by hypermail 2.1.5 : Thu Apr 28 2005 - 14:29:58 CST