Re: Code Point -- What is the integer?

From: Sivakatirswami (katir@hindu.org)
Date: Thu Apr 28 2005 - 13:08:26 CST

  • Next message: Sivakatirswami: "Re: Code Point -- What is the integer?"

    Aloha, Ken:

    You are a jewel: all questions answered, exactly what's needed... thank
    you. Unicode will be featured on the back page of Hinduism Today as
    one of four picks for this pages which is called.

    "Digital Dharma"

    July issue.

    Yes, i can do that hex to dec conversion... knowing that there is an
    exactly correspondence between the integer position of the code point
    and the decimal number.

    Of course, acknowledged that in actual usage the ordinal position in
    the series has little to do with real alpha collation except incidently
    as some alphabets were entered in their "indigenous" ordinal sequence.

    [This happens to be one of the "pet peeeves" of the Tamils, that the
    unicode series for Tamil is out of order....]

    Thanks again.

    Sannyasin Sivakatirswami
    Himalayan Academy Publications
    at Kauai's Hindu Monastery
    katir@hindu.org

    www.HimalayanAcademy.com,
    www.HinduismToday.com
    www.Gurudeva.org
    www.Hindu.org

    On Apr 28, 2005, at 8:48 AM, Kenneth Whistler wrote:

    >
    >> Namaskar and Aloha from the offices of Himalayan Academy Publications
    >> in Hawaii...
    >
    > Welcome to the Unicode list!
    >
    >> 1. The code points described as a simple series of integers from
    >>
    >> 1 to 1,123,000 (or whatever that last integer is that is equivalent
    >> to:
    >> U+10FFFF)
    >
    > The last decimal number is 1,114,111, FYI. (Easier to remember as
    > 1114111.)
    >
    >
    >> "Unicode is this just a long series from One to over One Million and
    >> there is a character in each place and the whole list includes all the
    >> characters of all the languages known to man, past and present."
    >
    > Well, the project hasn't been finished. There are characters still
    > not in, like Egyptian hieroglyphics. But that is the essence of
    > the project, yes.
    >
    >> 2. but then we move on to: " Unicode characters may be encoded at any
    >> code point from U+0000 to U+10FFFF" and now we begin to slide into the
    >> "nerd realm"
    >
    > Frank provided a nice summary of the justification for why engineers
    > prefer hexadecimal representations.
    >
    >>
    >> I understand "004F" to be the hexadecimal representation for four
    >> separate, 4-bit sequences.
    >
    > Well, actually one 16-bit sequence: 0000000001001111
    >
    > But for readability, that is often broken up into a sequence of
    > 4-bit sequences called "nibbles": 0000 0000 0100 1111
    >
    >>
    >> for purposes of a diagram, I would like to translate any given such
    >> code point designation like A = U+0041 to its integer position in the
    >> series. (aside question: what do you call that kind of "label" for
    >> the
    >> code point: "U+****"?)
    >
    > The Unicode Standard just calls it the "code point".
    >
    > The ISO/IEC 10646 international standard calls it the "short identifier
    > for code positions".
    >
    > The two things mean the same.
    >
    >>
    >> e.g. expressed verbally, if one were writing an article for "mom and
    >> pop"
    >>
    >> The capital letter A is number "65" in the series... but computer
    >> geeks like to express it in hexidecimal form like this, "U+0041" and
    >> if
    > ^^^^^^^^^^^
    > hexadecimal (often misspelled ;-) )
    >> you really need to describe it to the computer then it is "0000 0000
    >> 0100 0001"
    >>
    >> or in a diagram simply
    >>
    >> A --> 65 --> U+0041 --> 0000 0000 0100 0001
    >>
    >> And ditto for one Tamil Char and one Chinese character... but my
    >> problem is ascertaining the second, simple integer, segement...
    >>
    >> OK, so my questions are:
    >>
    >> 1) is the decimal expression for the capital letter A as 65 exactly
    >> correspondent to its integer code point position in the total unicode
    >> series expressed as as a series of integers?
    >
    > Yes.
    >
    >>
    >> 2) How can one ascertain the integer number for a code point
    >> outside-above base ANSI?
    >
    > The easiest way is to make use of the calculators that are
    > available as desk accessories on almost any computer. (Windows,
    > Mac, Solaris, Linux, etc., all have one.)
    >
    > On Windows: Programs > Accessories > Calculator
    >
    > Set it to "Scientific". Choose "Hex". Type in the hexadecimal
    > number (e.g. "BE6"). Hit the "Dec" button, and presto, it
    > changes to "3046", which is the decimal number equivalent of
    > hexadecimal 0x0BE6.
    >
    >
    >> So I we want to be able to say, for the layman:
    >>
    >> "The entire Tamil alphabet is contained between characters 2560 and
    >> 2843 in the unicode series" But one need sto
    >
    > The block for Tamil is U+0B80..U+0BFF. So if you convert those
    > numbers to decimal, the range is: 2944..3071.
    >
    >> a) be able find where those blocks are (where do you go to find the
    >> blocks beginning and endings for different languages)
    >
    > Go to the chart pages that other respondents already pointed
    > you to.
    >
    >> b) be able to translate "U+0BE6" (which is a position in the Tamil
    >> set)
    >> back to a simple integer in the series. If I just "do the math*
    >> using
    >> the same correlation for the Letter A ["0041" = "65"therefore 0BE6
    >> must
    >> equal **** ] ... will it be correct?
    >
    > Yes. And U+0BE6 --> decimal 3046.
    >
    >> I'm hoping I can go somewhere to find this info easily from some
    >> tables....
    >
    > Just use the calculator accessories. It is easy.
    >
    > --Ken
    >
    >



    This archive was generated by hypermail 2.1.5 : Thu Apr 28 2005 - 13:55:15 CST