Re: Why "UTF-5" is not a UTF

From: Robert A. Rosenberg (bob.rosenberg@digitscorp.com)
Date: Fri Mar 03 2000 - 13:47:39 EST


At 06:53 PM 03/02/2000 -0800, Kenneth Whistler wrote:
> > At 07:30 AM 03/02/2000 -0800, Doug Ewell wrote:
> > >I've been working on implementing a UTF-5 encoder and decoder based on
> > >the specifications in the file
> > >
> > >http://ftp.univie.ac.at/netinfo/internet-drafts/draft-jseng-utf5-01.txt
> > >
> > >and I am running into problems with what I will call "UTF-5 mode,"
>
>Bob Rosenberg responded to Doug Ewell:
> >
> > You are not looking at the problem correctly. In the case of an Email
> > Address, the syntax is name@domain. In the example shown, the CONTENTS of
> > name and domain are rendered in UTF-5 NOT the full string. Thus you pass
> > the 3 sections of the address (which are delineated by the "@" and the
> ".")
> > through the converter SEPARATELY. IOW: You must parse the string based on
> > its format to extract the UTF-5 sections (as well as syntax validate it).
> >
>
>Actually, there is a much more serious problem represented by the
>UTF-5 Internet Draft. The very term "UTF-5" is seriously misleading,
>because "UTF-5" is not a Unicode Transformation Format at all,
>as defined by the standard, but instead represents a Transfer Encoding Syntax
>(TES) masquerading as a UTF.

Note: I was only responding to the gripe that the string does not transform
from the "UTF-5" scheme back to codepoints (due to the inclusion of
"invalid" codepoints in the string. I agree with Ken about the fact that it
is really a TES (and the code used for the untransformed codes is not defined).



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:59 EDT