Re: Code Point -- What is the integer?

From: Sinnathurai Srivas (sisrivas@blueyonder.co.uk)
Date: Thu Apr 28 2005 - 17:19:39 CST

Next message: Mark Davis: "Re: Transliterator"

Previous message: Markus Scherer: "Re: Transliterator"
In reply to: Sivakatirswami: "Code Point -- What is the integer?"
Next in thread: Hans Aberg: "Re: Code Point -- What is the integer?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This URL
http://www.unicode.org/charts/PDF/U0B80.pdf
at
http://www.unicode.org/charts/
might help.

Think of Roman way of counting.
Think of Arabic/ Indic way of counting
Think of hexadecimal counting.

This page uses hexadecial http://www.unicode.org/charts/PDF/U0B80.pdf
You can tranlate (convert this to decimal ie, Arabi/Indic as world knows it)

Hope this makes sense.

ex Tamil K (Ka) = Hexadecimal "0b85" = Decimal "2949"
Hex is another language in counting system.

Sinnathurai

----- Original Message -----
From: "Sivakatirswami" <katir@hindu.org>
To: <unicode@unicode.org>
Sent: Thursday, April 28, 2005 5:43 AM
Subject: Code Point -- What is the integer?

> Namaskar and Aloha from the offices of Himalayan Academy Publications in
> Hawaii...
>
> Where we are just slowly learning about Unicode in our publications work..
>
> I'm writing a short article on Unicode in a "public" magazine (Hinduism
> Today) about Mac OSX Tiger ((10.4) support for Tamil Unicode...
>
> I need to get down to a very layman's level and only have a very small
> space allotment.
>
> Despite reading all the documents ( I downloaded *all* the PDF's for the
> 4.0 standard book) I *still* have trouble getting my head around the
> difference between
>
> 1. The code points described as a simple series of integers from
>
> 1 to 1,123,000 (or whatever that last integer is that is equivalent to:
> U+10FFFF)
>
> This being the simplest way a layman can visualize it, albeit the latter
> number is big... it still easy to describe and visualize (roughly of
> course) as in:
>
> "Unicode is this just a long series from One to over One Million and
> there is a character in each place and the whole list includes all the
> characters of all the languages known to man, past and present."
>
> Which of course sounds at the very least "cool" for the glib-minded and
> incredibly ground breaking for those who can see it for what it is... (if
> true, which it seems to be...)
>
> 2. but then we move on to: " Unicode characters may be encoded at any
> code point from U+0000 to U+10FFFF" and now we begin to slide into the
> "nerd realm"
>
> I understand "004F" to be the hexadecimal representation for four
> separate, 4-bit sequences.
>
> for purposes of a diagram, I would like to translate any given such code
> point designation like A = U+0041 to its integer position in the series.
> (aside question: what do you call that kind of "label" for the code point:
> "U+****"?)
>
> e.g. expressed verbally, if one were writing an article for "mom and pop"
>
> The capital letter A is number "65" in the series... but computer geeks
> like to express it in hexidecimal form like this, "U+0041" and if you
> really need to describe it to the computer then it is "0000 0000 0100
> 0001"
>
> or in a diagram simply
>
> A --> 65 --> U+0041 --> 0000 0000 0100 0001
>
> And ditto for one Tamil Char and one Chinese character... but my problem
> is ascertaining the second, simple integer, segement...
>
> OK, so my questions are:
>
> 1) is the decimal expression for the capital letter A as 65 exactly
> correspondent to its integer code point position in the total unicode
> series expressed as as a series of integers?
>
> 2) How can one ascertain the integer number for a code point outside-above
> base ANSI?
>
> e.g. in the diagram I want to put an English char, a Tamil chara and a
> Chinese character...
>
> So I we want to be able to say, for the layman:
>
> "The entire Tamil alphabet is contained between characters 2560 and 2843
> in the unicode series" But one need sto
>
> a) be able find where those blocks are (where do you go to find the blocks
> beginning and endings for different languages)
> b) be able to translate "U+0BE6" (which is a position in the Tamil set)
> back to a simple integer in the series. If I just "do the math* using the
> same correlation for the Letter A ["0041" = "65"therefore 0BE6 must equal
> **** ] ... will it be correct?
>
> I'm hoping I can go somewhere to find this info easily from some
> tables....
>
> TIA!
>
> Sannyasin Sivakatirswami
> Himalayan Academy Publications
> at Kauai's Hindu Monastery
> katir@hindu.org
>
> www.HimalayanAcademy.com,
> www.HinduismToday.com
> www.Gurudeva.org
> www.Hindu.org
>
>
>
>

Next message: Mark Davis: "Re: Transliterator"
Previous message: Markus Scherer: "Re: Transliterator"
In reply to: Sivakatirswami: "Code Point -- What is the integer?"
Next in thread: Hans Aberg: "Re: Code Point -- What is the integer?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Apr 28 2005 - 17:22:22 CST