Re: 32'nd bit & UTF-8

From: Hans Aberg (haberg@math.su.se)
Date: Wed Jan 19 2005 - 19:28:17 CST

Next message: Hans Aberg: "Re: Subject: Re: 32'nd bit & UTF-8"

Previous message: Christopher Fynn: "Re: Subject: Re: 32'nd bit & UTF-8"
In reply to: Kenneth Whistler: "Re: 32'nd bit & UTF-8"
Next in thread: Philippe VERDY: "Re: Re: 32'nd bit & UTF-8"
Maybe reply: Philippe VERDY: "Re: Re: 32'nd bit & UTF-8"
Maybe reply: Philippe VERDY: "Re: Re: 32'nd bit & UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On 2005/01/19 20:49, Kenneth Whistler at kenw@sybase.com wrote:

> It is *not* "a transformation dealing with characters", but a mapping
> between Unicode scalar values (short hand for, and synonymous
> to 0000..D7FF, E000..10FFFF) to code unit sequences (bytes in the
> case of UTF-8, 16-bit units [wydes] in the case of UTF-16, and
> 32-bit words in the case of UTF-32).

One might give a purely mathematical definition of a Unicode character,
freed from any computer representation, as a pair (k, s), where k is an
integer, and s is string, or finite list, of elements from the set S := {A,
..., Z, ' '} (i.e., an element in the free monoid on the set S). Then, UTF-8
defines a function f: (k, s) |-> (b, s), where b is a finite sequence of
bytes (math definition omitted), where k in [0, 0x10FFFF]. The
transformation I spoke about is a function g: k |-> b, where k in [0,
2^32-1] such that f(k, s(k)) = (g(k), s(k)) when k in [0, 0x10FFFF].

Hans Aberg

Next message: Hans Aberg: "Re: Subject: Re: 32'nd bit & UTF-8"
Previous message: Christopher Fynn: "Re: Subject: Re: 32'nd bit & UTF-8"
In reply to: Kenneth Whistler: "Re: 32'nd bit & UTF-8"
Next in thread: Philippe VERDY: "Re: Re: 32'nd bit & UTF-8"
Maybe reply: Philippe VERDY: "Re: Re: 32'nd bit & UTF-8"
Maybe reply: Philippe VERDY: "Re: Re: 32'nd bit & UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Jan 19 2005 - 19:29:41 CST