Re: Code Point -- What is the integer?

From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Thu Apr 28 2005 - 12:03:38 CST

Next message: Kenneth Whistler: "Re: Code Point -- What is the integer?"

Previous message: Jukka K. Korpela: "Re: String name and Character Name"
In reply to: Sivakatirswami: "Code Point -- What is the integer?"
Next in thread: Kenneth Whistler: "Re: Code Point -- What is the integer?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On Wed, 27 Apr 2005, Sivakatirswami wrote:

> "Unicode is this just a long series from One to over One Million and
> there is a character in each place and the whole list includes all the
> characters of all the languages known to man, past and present."

That sounds like a useful "visualization", but it is not quite correct.
It's a good starting point for an analysis:

Unicode is an evolving standard, and new characters are added to it.
It contains almost all characters used in living languages and writing
systems, but not all historic characters or characters used in
special notations (mathematics etc.). Besides, not all characters have a
code point as such; some characters containing a diacritic mark can only
be written as decomposed, i.e. as a base character followed by one or more
combining diacritic marks.

Not all places (code points) contain a character - most code points are
currently unassigned, and some are explicitly defined as noncharacters.

> I understand "004F" to be the hexadecimal representation for four
> separate, 4-bit sequences.

No, it is just a different (base 16) notation for an integer, and it
postulates no particular implementation at bit level. It's simply a
numeral. Unicode (and other character standards) mostly used hexadecimal
notation for code points, partly due to the structure of the coding space.

A word of warning: although characters are identified by their code
points, which are numbers (unsigned integers), the _numeric_ (arithmetic)
value is usually irrelevant. That is, we mostly don't operate on them as
numbers, with arithmetic operations. For most purposes, the numbers are
just indexes. For instance, if a character's code point is numerically
smaller than another character's code point, this implies in general
nothing about the mutual order of the _characters_ in alphabet or sorting
order. (It is more or less a coincidence that _some_ characters have code
points that correspond to their mutual alphabetic order.)

-- 
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/

Next message: Kenneth Whistler: "Re: Code Point -- What is the integer?"
Previous message: Jukka K. Korpela: "Re: String name and Character Name"
In reply to: Sivakatirswami: "Code Point -- What is the integer?"
Next in thread: Kenneth Whistler: "Re: Code Point -- What is the integer?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Apr 28 2005 - 12:04:27 CST