Re: UTF-8N?

From: John Cowan (jcowan@reutershealth.com)
Date: Wed Jun 21 2000 - 14:56:31 EDT

Next message: John Hudson: "Re: Bengali: variants of same conjunct"
Previous message: Abdul Malik: "Re: Bengali: variants of same conjunct"
Maybe in reply to: Masahiko Maedera: "UTF-8N?"
Next in thread: Peter_Constable@sil.org: "Re: UTF-8N?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> UTF-8 files both with and without a BOM serialize the character
> representations into bytes (octets) in exactly the same way. That's the
> basis for distinguishing between encoding schemes, and since there isn't a
> difference, there is only one encoding scheme involved in both cases.

I don't think so. One encoding scheme encodes U+0020 (a single space character) as
one byte (0x20), whereas the other one encodes it as four bytes
(0xEF 0xBB 0xBF 0x20).

-- 
Schlingt dreifach einen Kreis um dies! || John Cowan <jcowan@reutershealth.com>
Schliesst euer Aug vor heiliger Schau,  || http://www.reutershealth.com
Denn er genoss vom Honig-Tau,           || http://www.ccil.org/~cowan
Und trank die Milch vom Paradies.            -- Coleridge (tr. Politzer)

Next message: John Hudson: "Re: Bengali: variants of same conjunct"
Previous message: Abdul Malik: "Re: Bengali: variants of same conjunct"
Maybe in reply to: Masahiko Maedera: "UTF-8N?"
Next in thread: Peter_Constable@sil.org: "Re: UTF-8N?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:04 EDT