Re: UTF8 vs AL32UTF8

From: Jianping Yang (Jianping.Yang@oracle.com)
Date: Tue Jun 12 2001 - 14:13:48 EDT


Peter_Constable@sil.org wrote:

> On 06/11/2001 10:45:46 PM Mark Davis wrote:
>
> [earlier]
> > - Oracle could probably make a case for their name for UTF8 simply being
> >an
> > anachronism. After all, the original definition of UTF-8 did convert
> > surrogate pairs as they are doing in what they call UTF8.
>
> [now]
> >UTF-8 was defined before UTF-16. At the time it was first defined, there
> >were no surrogates, so there was no special handling of the D800..DFFF
> code
> >points.
>
> The critical thing, though, is that in UTF-8 as originally designed, there
> was no question about the meaning of < ED A0 80 ED B0 80 >, of < F0 90 80
> 80>, and whether either could mean U-00010000. They definitely did not mean
> the same thing, and the former definitely did not mean U-00010000. So
> Oracle would fail utterly if being judged on that basis.
>

If you convert < ED A0 80 ED B0 80 > into UTF-16, what does it mean then? I
think definitely it means U-00010000.

Regards,
Jianping.

>
> - Peter
>
> ---------------------------------------------------------------------------
> Peter Constable
>
> Non-Roman Script Initiative, SIL International
> 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
> Tel: +1 972 708 7485
> E-mail: <peter_constable@sil.org>





This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:18 EDT