Re: Code Point -- What is the integer?

From: Sinnathurai Srivas (sisrivas@blueyonder.co.uk)
Date: Thu Apr 28 2005 - 17:19:39 CST

  • Next message: Mark Davis: "Re: Transliterator"

    This URL
    http://www.unicode.org/charts/PDF/U0B80.pdf
    at
    http://www.unicode.org/charts/
    might help.

    Think of Roman way of counting.
    Think of Arabic/ Indic way of counting
    Think of hexadecimal counting.

    This page uses hexadecial http://www.unicode.org/charts/PDF/U0B80.pdf
    You can tranlate (convert this to decimal ie, Arabi/Indic as world knows it)

    Hope this makes sense.

    ex Tamil K (Ka) = Hexadecimal "0b85" = Decimal "2949"
    Hex is another language in counting system.

    Sinnathurai

    ----- Original Message -----
    From: "Sivakatirswami" <katir@hindu.org>
    To: <unicode@unicode.org>
    Sent: Thursday, April 28, 2005 5:43 AM
    Subject: Code Point -- What is the integer?

    > Namaskar and Aloha from the offices of Himalayan Academy Publications in
    > Hawaii...
    >
    > Where we are just slowly learning about Unicode in our publications work..
    >
    > I'm writing a short article on Unicode in a "public" magazine (Hinduism
    > Today) about Mac OSX Tiger ((10.4) support for Tamil Unicode...
    >
    > I need to get down to a very layman's level and only have a very small
    > space allotment.
    >
    > Despite reading all the documents ( I downloaded *all* the PDF's for the
    > 4.0 standard book) I *still* have trouble getting my head around the
    > difference between
    >
    > 1. The code points described as a simple series of integers from
    >
    > 1 to 1,123,000 (or whatever that last integer is that is equivalent to:
    > U+10FFFF)
    >
    > This being the simplest way a layman can visualize it, albeit the latter
    > number is big... it still easy to describe and visualize (roughly of
    > course) as in:
    >
    > "Unicode is this just a long series from One to over One Million and
    > there is a character in each place and the whole list includes all the
    > characters of all the languages known to man, past and present."
    >
    > Which of course sounds at the very least "cool" for the glib-minded and
    > incredibly ground breaking for those who can see it for what it is... (if
    > true, which it seems to be...)
    >
    > 2. but then we move on to: " Unicode characters may be encoded at any
    > code point from U+0000 to U+10FFFF" and now we begin to slide into the
    > "nerd realm"
    >
    > I understand "004F" to be the hexadecimal representation for four
    > separate, 4-bit sequences.
    >
    > for purposes of a diagram, I would like to translate any given such code
    > point designation like A = U+0041 to its integer position in the series.
    > (aside question: what do you call that kind of "label" for the code point:
    > "U+****"?)
    >
    > e.g. expressed verbally, if one were writing an article for "mom and pop"
    >
    > The capital letter A is number "65" in the series... but computer geeks
    > like to express it in hexidecimal form like this, "U+0041" and if you
    > really need to describe it to the computer then it is "0000 0000 0100
    > 0001"
    >
    > or in a diagram simply
    >
    > A --> 65 --> U+0041 --> 0000 0000 0100 0001
    >
    > And ditto for one Tamil Char and one Chinese character... but my problem
    > is ascertaining the second, simple integer, segement...
    >
    > OK, so my questions are:
    >
    > 1) is the decimal expression for the capital letter A as 65 exactly
    > correspondent to its integer code point position in the total unicode
    > series expressed as as a series of integers?
    >
    > 2) How can one ascertain the integer number for a code point outside-above
    > base ANSI?
    >
    > e.g. in the diagram I want to put an English char, a Tamil chara and a
    > Chinese character...
    >
    > So I we want to be able to say, for the layman:
    >
    > "The entire Tamil alphabet is contained between characters 2560 and 2843
    > in the unicode series" But one need sto
    >
    > a) be able find where those blocks are (where do you go to find the blocks
    > beginning and endings for different languages)
    > b) be able to translate "U+0BE6" (which is a position in the Tamil set)
    > back to a simple integer in the series. If I just "do the math* using the
    > same correlation for the Letter A ["0041" = "65"therefore 0BE6 must equal
    > **** ] ... will it be correct?
    >
    > I'm hoping I can go somewhere to find this info easily from some
    > tables....
    >
    > TIA!
    >
    > Sannyasin Sivakatirswami
    > Himalayan Academy Publications
    > at Kauai's Hindu Monastery
    > katir@hindu.org
    >
    > www.HimalayanAcademy.com,
    > www.HinduismToday.com
    > www.Gurudeva.org
    > www.Hindu.org
    >
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Thu Apr 28 2005 - 17:22:22 CST