Re: Unicode forms for internal storage - BOCU-1 speed

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Thu Jan 22 2004 - 18:58:35 EST

  • Next message: Kenneth Whistler: "Re: Unicode forms for internal storage - BOCU-1 speed"

    RE: Unicode forms for internal storage - BOCU-1 speedFrom: Mike Ayers
    > The author called it "UTF-9". Therefore we call it the same thing so
    anyone
    > knows what we're talking about. It may not be ideal, but it's
    intelligible.
    > Why should anyone assume that something is an international standard just
    > because its name starts with "UTF-"?

    You can't assume that everybody knows what is spoken about when one finds
    a reference to a name starting by "UTF-". The first question that will come
    is
    that Unicode does not document it, and where it can be found.
    I don't object proposals to define new "UTF-*" forms, but this should still
    be
    proposals for an otherwise distinctly named encoding form, chosen by the
    proposal author out of the "UTF-*" naming space.

    Did Jerome Abela or Mark Crispin provide a reference name/symbol for their
    encoding? They could have simply used their initials to reference it and to
    say, for example in the case of Mark Crispin's encoding form:

    "MC-UTF-9" is a Unicode-conforming encoding form used to represent any
    valid Unicode string with 9-bit code units. It is proposed as a candidate
    future encoding form that may be referenced later, if approved by a Unicode
    official reference document or in a IETF/ISO/IEC 10646 published RFC, by
    the name "UTF-9". Until then, this encoding form should never be referenced
    by the informal acronym "UTF-9". "MC-UTF-9" then designates only the
    encoding form specified by Mark Crispin in this document, and this name
    as well as the term "UTF-9" should not be used for any other proposed 9-bit
    encoding forms, except if approved by official Unicode or ISO/IEC 10646
    publications.

    Such sentence makes sense and avoids confusions later, notably when
    several candidate encodings are studied. It also allows mutliple encodings
    to
    survive and interoperate.

    So let's not approve here the informal absive use of non standardized "UTF-"
    encoding schemes or forms... Unicode should ask to IANA to reject such
    registration needed for some MIME implementations, by reserving for itself
    (or for IETF if it wants to publish RFCs related to ISO standards) this
    prefix
    for future uses.

    I have seen several other informal proposals for "UTF-*" forms/schemes.
    All this is just confusive, and their authors should imagine their own names
    for reference. What do you think of this idea?



    This archive was generated by hypermail 2.1.5 : Thu Jan 22 2004 - 19:39:50 EST