Re: Unicode forms for internal storage - BOCU-1 speed

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Thu Jan 22 2004 - 18:58:35 EST

Next message: Kenneth Whistler: "Re: Unicode forms for internal storage - BOCU-1 speed"

Previous message: Mike Ayers: "RE: Unicode forms for internal storage - BOCU-1 speed"
In reply to: Mike Ayers: "RE: Unicode forms for internal storage - BOCU-1 speed"
Next in thread: Marco Cimarosti: "RE: Unicode forms for internal storage - BOCU-1 speed"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

RE: Unicode forms for internal storage - BOCU-1 speedFrom: Mike Ayers
> The author called it "UTF-9". Therefore we call it the same thing so
anyone
> knows what we're talking about. It may not be ideal, but it's
intelligible.
> Why should anyone assume that something is an international standard just
> because its name starts with "UTF-"?

You can't assume that everybody knows what is spoken about when one finds
a reference to a name starting by "UTF-". The first question that will come
is
that Unicode does not document it, and where it can be found.
I don't object proposals to define new "UTF-*" forms, but this should still
be
proposals for an otherwise distinctly named encoding form, chosen by the
proposal author out of the "UTF-*" naming space.

Did Jerome Abela or Mark Crispin provide a reference name/symbol for their
encoding? They could have simply used their initials to reference it and to
say, for example in the case of Mark Crispin's encoding form:

"MC-UTF-9" is a Unicode-conforming encoding form used to represent any
valid Unicode string with 9-bit code units. It is proposed as a candidate
future encoding form that may be referenced later, if approved by a Unicode
official reference document or in a IETF/ISO/IEC 10646 published RFC, by
the name "UTF-9". Until then, this encoding form should never be referenced
by the informal acronym "UTF-9". "MC-UTF-9" then designates only the
encoding form specified by Mark Crispin in this document, and this name
as well as the term "UTF-9" should not be used for any other proposed 9-bit
encoding forms, except if approved by official Unicode or ISO/IEC 10646
publications.

Such sentence makes sense and avoids confusions later, notably when
several candidate encodings are studied. It also allows mutliple encodings
to
survive and interoperate.

So let's not approve here the informal absive use of non standardized "UTF-"
encoding schemes or forms... Unicode should ask to IANA to reject such
registration needed for some MIME implementations, by reserving for itself
(or for IETF if it wants to publish RFCs related to ISO standards) this
prefix
for future uses.

I have seen several other informal proposals for "UTF-*" forms/schemes.
All this is just confusive, and their authors should imagine their own names
for reference. What do you think of this idea?

Next message: Kenneth Whistler: "Re: Unicode forms for internal storage - BOCU-1 speed"
Previous message: Mike Ayers: "RE: Unicode forms for internal storage - BOCU-1 speed"
In reply to: Mike Ayers: "RE: Unicode forms for internal storage - BOCU-1 speed"
Next in thread: Marco Cimarosti: "RE: Unicode forms for internal storage - BOCU-1 speed"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Jan 22 2004 - 19:39:50 EST