Encoding of non-characters

From: Doug Ewell (dewell@compuserve.com)
Date: Sat Jul 29 2000 - 12:02:03 EDT

Next message: Mark Davis: "Re: Encoding of non-characters"
Previous message: Roozbeh Pournader: "Re: Euro"
Next in thread: Mark Davis: "Re: Encoding of non-characters"
Maybe reply: Mark Davis: "Re: Encoding of non-characters"
Maybe reply: Doug Ewell: "Re: Encoding of non-characters"
Maybe reply: Doug Ewell: "Re: Encoding of non-characters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Did I read recently (in a message that I shortsightedly deleted)
something to the effect that a character encoding scheme (CES) or
transfer encoding syntax (TES) needs to be able to encode the non-
characters U+D800 through U+DFFF, and presumably U+xxFFFE and U+xxFFFF
as well?

I've been playing around with a TES (or maybe it's a CES; I'm still
having a little trouble knowing exactly where to draw the line). Don't
worry, I'm not going to propose it anywhere as Yet Another UTF. I'm
just playing around with Unicode, and hopefully teaching myself
something along the way.

Anyway, my scheme encodes non-BMP characters not *as* surrogates, but
using the surrogate mechanism in a slightly modified way. Like UTF-16,
this makes it impossible to encode the BMP non-characters in the range
U+D800 through U+DFFF. Normally I wouldn't think this was a problem,
but I thought someone (Davis?) just said recently that it should be
possible to round-trip these thingies, for some reason.

The situation would be different in the case of U+xxFFFE and U+xxFFFF,
because while the surrogates occupy entire ranges that can be utilized
in a special way, you kind of have to *deliberately* exclude the FFFx
characters. Nonetheless, the same question applies: Must these bogus
code points be representable in a CES or TES, or can they be handled
conformantly by raising an error or mapping them to U+FFFD?

-Doug Ewell
Fullerton, California

Next message: Mark Davis: "Re: Encoding of non-characters"
Previous message: Roozbeh Pournader: "Re: Euro"
Next in thread: Mark Davis: "Re: Encoding of non-characters"
Maybe reply: Mark Davis: "Re: Encoding of non-characters"
Maybe reply: Doug Ewell: "Re: Encoding of non-characters"
Maybe reply: Doug Ewell: "Re: Encoding of non-characters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:06 EDT