Re: Bastardizations of UTF-8 (was: Re: [OT] Unicode-compatible SQL?)

From: John Cowan (jcowan@reutershealth.com)
Date: Mon Feb 05 2001 - 16:26:05 EST

Next message: Sebastian Hagedorn: "Re: Macintosh OS8.6, OS9"
Previous message: Tex Texin: "Re: Bastardizations of UTF-8 (was: Re: [OT] Unicode-compatible SQL?)"
Maybe in reply to: DougEwell2@cs.com: "Bastardizations of UTF-8 (was: Re: [OT] Unicode-compatible SQL?)"
Next in thread: addison@inter-locale.com: "Re: Bastardizations of UTF-8 (was: Re: [OT] Unicode-compatible SQL?)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Tex Texin wrote:

> I am not clear from your comments which is the bug, since the doc
> goes both ways. Are the doc bugs that they say
> it is UTF-8, or that they say it is modified UTF-8?

It uses modified UTF-8, modified in three ways:

1) U+0000 is encoded in two bytes as 0xc0 0x80;

2) values above U+FFFF are encoded in six bytes as the UTF-8 encoding
of their UTF-16 equivalent form;

3) the whole string is prefixed with a byte count represented
as a 2-byte big-endian binary integer.

> It would be great to learn that the functions are actually unmodified
> UTF-8, as I know of some interfaces that are writing non-Java
> code and are forced to deal with specialized handling of the modified
> UTF-8.
> It would be great to inform them they can use standard UTF-8 library
> routines.

*chomp* No such luck Doc!

-- 
There is / one art             || John Cowan <jcowan@reutershealth.com>
no more / no less              || http://www.reutershealth.com
to do / all things             || http://www.ccil.org/~cowan
with art- / lessness           \\ -- Piet Hein

Next message: Sebastian Hagedorn: "Re: Macintosh OS8.6, OS9"
Previous message: Tex Texin: "Re: Bastardizations of UTF-8 (was: Re: [OT] Unicode-compatible SQL?)"
Maybe in reply to: DougEwell2@cs.com: "Bastardizations of UTF-8 (was: Re: [OT] Unicode-compatible SQL?)"
Next in thread: addison@inter-locale.com: "Re: Bastardizations of UTF-8 (was: Re: [OT] Unicode-compatible SQL?)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:18 EDT