Re: Unicode 3.0 Release

From: Mark Davis (mark@macchiato.com)
Date: Tue Sep 14 1999 - 00:10:52 EDT

Next message: Erland Sommarskog: "RE: Identifying file encoding scheme"
Previous message: Krebs, Mike: "RE: Identifying file encoding scheme"
Maybe in reply to: mark.davis@us.ibm.com: "Unicode 3.0 Release"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

You are correct: there is a typo in that line: it should be 0x800 instead of
0x400. Thank you for bringing it to our attention.

Mark

P.S. A nice way to remember the number of bits in each form of UTF-8 is that
it is 5 bits / byte + 1, plus another 1 in the case of the single byte form.
That is, the 1-byte form gives you 7 bits, the 2-byte form gives you 11
bits, 3 byte gives 16, 4 byte gives 21.

Masahiko Maedera wrote:

> Dear, Mr. Mark Davis.
>
> Now I have found something wrong in the technical report 17.
>
> http://www.unicode.org/unicode/reports/tr17/
>
> > UTF-8 provides a good example:
> > ...
> > 0x80..0x3FF ---> 2 bytes
> > 0x400..0xD7FF, 0xE000..0xFFFF ---> 3 bytes
> > ...
>
> but, in the RFC 2279 UTF-8, the below is described.
>
> > 0000 0080-0000 07FF 110xxxxx 10xxxxxx
> > 0000 0800-0000 FFFF 1110xxxx 10xxxxxx 10xxxxxx ( excluding surrogate )
>
> Should it be modified as the following?
>
> > 0x80..0x7FF ---> 2 bytes
> > 0x800..0xD7FF, 0xE000..0xFFFF ---> 3 bytes
>
> Best regards,
> Masahiko

Next message: Erland Sommarskog: "RE: Identifying file encoding scheme"
Previous message: Krebs, Mike: "RE: Identifying file encoding scheme"
Maybe in reply to: mark.davis@us.ibm.com: "Unicode 3.0 Release"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT