**From:** Kenneth Whistler (*kenw@sybase.com*)

**Date:** Wed Jan 19 2005 - 20:07:10 CST

**Previous message:**Hans Aberg: "Re: Subject: Re: 32'nd bit & UTF-8"**Maybe in reply to:**Hans Aberg: "32'nd bit & UTF-8"**Next in thread:**Arcane Jill: "Re: 32'nd bit & UTF-8"**Maybe reply:**Philippe VERDY: "Re: Re: 32'nd bit & UTF-8"**Maybe reply:**Philippe VERDY: "Re: Re: 32'nd bit & UTF-8"**Messages sorted by:**[ date ] [ thread ] [ subject ] [ author ] [ attachment ]**Mail actions:**[ respond to this message ] [ mail a new topic ]

*> One might give a purely mathematical definition of a Unicode character,
*

*> freed from any computer representation, as a pair (k, s), where k is an
*

*> integer, and s is string, or finite list, of elements from the set S := {A,
*

*> ..., Z, ' '} (i.e., an element in the free monoid on the set S). Then, UTF-8
*

*> defines a function f: (k, s) |-> (b, s), where b is a finite sequence of
*

*> bytes (math definition omitted), where k in [0, 0x10FFFF]. The
*

*> transformation I spoke about is a function g: k |-> b, where k in [0,
*

*> 2^32-1] such that f(k, s(k)) = (g(k), s(k)) when k in [0, 0x10FFFF].
*

Which is just mathematical gobbledygook for saying you want

to define an extension of UTF-8 which does the same mapping

of code point to byte sequence for any code point in the

Unicode scalar value range (which is [0, 0xDFFF] U [0xE000, 0x10FFFF],

by the way, *not* [0, 0x10FFFF]), and which also gives you

a mapping of integers to byte sequence for the entire range

[0, 0xFFFFFFFF].

And which is just as objectionable stated in mathematics as stated

in terms understood by character encoders.

By the way, I believe your formulation to be bogus, because s(k)

is not defined here in any meaningful way for a character encoding

when k is not an element of the defined code space.

In other words, you *could* claim you have a pair (k, s) where

k = 0x10101010 and s = "HANS ABERG", but that would not be

a "Unicode character" in any sense acceptable to the standardizers

or the implementers.

Also, your set S is incorrectly defined.

--Ken

*>
*

*> Hans Aberg
*

*>
*

*>
*

**Next message:**Peter Kirk: "Re: Subject: Re: 32'nd bit & UTF-8"**Previous message:**Hans Aberg: "Re: Subject: Re: 32'nd bit & UTF-8"**Maybe in reply to:**Hans Aberg: "32'nd bit & UTF-8"**Next in thread:**Arcane Jill: "Re: 32'nd bit & UTF-8"**Maybe reply:**Philippe VERDY: "Re: Re: 32'nd bit & UTF-8"**Maybe reply:**Philippe VERDY: "Re: Re: 32'nd bit & UTF-8"**Messages sorted by:**[ date ] [ thread ] [ subject ] [ author ] [ attachment ]**Mail actions:**[ respond to this message ] [ mail a new topic ]

*
This archive was generated by hypermail 2.1.5
: Wed Jan 19 2005 - 20:07:52 CST
*