From: Sinnathurai Srivas (firstname.lastname@example.org)
Date: Thu Apr 28 2005 - 17:19:39 CST
Think of Roman way of counting.
Think of Arabic/ Indic way of counting
Think of hexadecimal counting.
This page uses hexadecial http://www.unicode.org/charts/PDF/U0B80.pdf
You can tranlate (convert this to decimal ie, Arabi/Indic as world knows it)
Hope this makes sense.
ex Tamil K (Ka) = Hexadecimal "0b85" = Decimal "2949"
Hex is another language in counting system.
----- Original Message -----
From: "Sivakatirswami" <email@example.com>
Sent: Thursday, April 28, 2005 5:43 AM
Subject: Code Point -- What is the integer?
> Namaskar and Aloha from the offices of Himalayan Academy Publications in
> Where we are just slowly learning about Unicode in our publications work..
> I'm writing a short article on Unicode in a "public" magazine (Hinduism
> Today) about Mac OSX Tiger ((10.4) support for Tamil Unicode...
> I need to get down to a very layman's level and only have a very small
> space allotment.
> Despite reading all the documents ( I downloaded *all* the PDF's for the
> 4.0 standard book) I *still* have trouble getting my head around the
> difference between
> 1. The code points described as a simple series of integers from
> 1 to 1,123,000 (or whatever that last integer is that is equivalent to:
> This being the simplest way a layman can visualize it, albeit the latter
> number is big... it still easy to describe and visualize (roughly of
> course) as in:
> "Unicode is this just a long series from One to over One Million and
> there is a character in each place and the whole list includes all the
> characters of all the languages known to man, past and present."
> Which of course sounds at the very least "cool" for the glib-minded and
> incredibly ground breaking for those who can see it for what it is... (if
> true, which it seems to be...)
> 2. but then we move on to: " Unicode characters may be encoded at any
> code point from U+0000 to U+10FFFF" and now we begin to slide into the
> "nerd realm"
> I understand "004F" to be the hexadecimal representation for four
> separate, 4-bit sequences.
> for purposes of a diagram, I would like to translate any given such code
> point designation like A = U+0041 to its integer position in the series.
> (aside question: what do you call that kind of "label" for the code point:
> e.g. expressed verbally, if one were writing an article for "mom and pop"
> The capital letter A is number "65" in the series... but computer geeks
> like to express it in hexidecimal form like this, "U+0041" and if you
> really need to describe it to the computer then it is "0000 0000 0100
> or in a diagram simply
> A --> 65 --> U+0041 --> 0000 0000 0100 0001
> And ditto for one Tamil Char and one Chinese character... but my problem
> is ascertaining the second, simple integer, segement...
> OK, so my questions are:
> 1) is the decimal expression for the capital letter A as 65 exactly
> correspondent to its integer code point position in the total unicode
> series expressed as as a series of integers?
> 2) How can one ascertain the integer number for a code point outside-above
> base ANSI?
> e.g. in the diagram I want to put an English char, a Tamil chara and a
> Chinese character...
> So I we want to be able to say, for the layman:
> "The entire Tamil alphabet is contained between characters 2560 and 2843
> in the unicode series" But one need sto
> a) be able find where those blocks are (where do you go to find the blocks
> beginning and endings for different languages)
> b) be able to translate "U+0BE6" (which is a position in the Tamil set)
> back to a simple integer in the series. If I just "do the math* using the
> same correlation for the Letter A ["0041" = "65"therefore 0BE6 must equal
> **** ] ... will it be correct?
> I'm hoping I can go somewhere to find this info easily from some
> Sannyasin Sivakatirswami
> Himalayan Academy Publications
> at Kauai's Hindu Monastery
This archive was generated by hypermail 2.1.5 : Thu Apr 28 2005 - 17:22:22 CST