Re: UTF-8N?

From: John Cowan ([email protected])
Date: Thu Jun 22 2000 - 14:08:15 EDT

Next message: Kenneth Whistler: "Re: UTF-8N?"
Previous message: Kenneth Whistler: "Re: UTF-8N?"
Maybe in reply to: Masahiko Maedera: "UTF-8N?"
Next in thread: Kenneth Whistler: "Re: UTF-8N?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

"Ayers, Mike" wrote:

> Am I reading this wrong? Here's what I get:
>
> I hand you a UTF-16 document. This document is:
>
> FE FF 00 48 00 65 00 6C 00 6C 00 6F
>
> ..so it says "Hello". Then I say, "Oh, by the way, that's
> big-endian." *POOF* The content of the document has changed, and there is
> now a 'ZERO WIDTH NO BREAK SPACE' at the beginning. Smells pretty skunky...

No, what you have said is that this document is in "UTF16-BE" encoding.
That's a name for an encoding that is known a priori to be BE, and does
not permit a BOM. It is not the name for an encoding that has a BOM but
just happens to be BE.

Since you have changed the encoding, the content has naturally
changed too, just as if you had declared an 8859-1 document
to be 8859-2.

> BTW, what is a ZWNBSP anyway? From here it seems like a
> non-character. Is there an actual use for it?

Yes. It indicates that a line break may not be introduced at this point.
It is similar to the NO-BREAK SPACE (U+00A0) which you may be familiar
with under its HTML name of  , except that it doesn't produce any actual
whitespace. ZWNBSP is useful in languages that don't use whitespace, and
in strings like "M.T.A." where a line breaker might be tempted to break after
a period.

Its opposite number is ZWSP (U+200B), which likewise doesn't generate any
actual whitespace, but indicates that line breaking *is* permitted here.

-- 
Schlingt dreifach einen Kreis um dies! || John Cowan <[email protected]>
Schliesst euer Aug vor heiliger Schau,  || http://www.reutershealth.com
Denn er genoss vom Honig-Tau,           || http://www.ccil.org/~cowan
Und trank die Milch vom Paradies.            -- Coleridge (tr. Politzer)

Next message: Kenneth Whistler: "Re: UTF-8N?"
Previous message: Kenneth Whistler: "Re: UTF-8N?"
Maybe in reply to: Masahiko Maedera: "UTF-8N?"
Next in thread: Kenneth Whistler: "Re: UTF-8N?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:04 EDT