Re: Unicode and RFC 4690

From: Jefsey_Morfin (
Date: Wed Oct 04 2006 - 09:45:57 CST

  • Next message: Marion Gunn: "Re: Should names for sexes be included into CLDR?"

    Dear Stephane,
    this kind of ad-hominem with the affimation of points everyone agree
    is not much productive. I respect your desire to concentrate on the
    English globalization level, however I question the technical
    interest of a layer violation. The homograph problem is patched at
    globalisition level as we currently see it, without much scalling
    benefit. I think it is easier to address at the multilingualisation
    layer where IMHO it belongs. I think you should reread RFC 4690 and
    try to bring another solution that the one we are a certain number to
    read as implied in the text: this would be of interest. All the more
    than the IAB shares with you the problem of disregarding the
    multilingualisation layer. This is IMHO the reason why it discusses
    problems and hesitates to propose the solution its text seem to imply
    and I agree with.

    This is precisely because we all agree that the choice of Unicode in
    protocols is not to be revisited, that we have a difficulty with the
    punycode process. John Klensin and the IAB have honnestly considered
    the problem and studied the current Unicode response. They certainly
    overlooked the technical responsibility of ICANN, but this is not the
    point here. They have concentrated on the Unicode aspects and do not
    find the Unicode comments satisfactory enough. Today we have no
    proposed solution. Or may be you see one and I would be glad you
    share it with us. Both on confusive characters and version update
    (this would already be a great progess).

    This is only related to punycode and Unicode. The initial patch to
    address this problem (the language tables) does not work. It is a
    partial external "add-on" to the punycode process. It seems that we
    need to use of a single grapheme table (by Unicode or others) in
    punycoding. This actually means to integrate the language tables
    diversity into one common unique (and not unified with several
    occurances of a grapheme being supported) universal table. The
    Unicode (each code having a character) process was computer inclusive
    (adding codes), we need a parallel human exclusive unigraphs (each
    graph having a code) process (only keeping one grapheme per homograph
    group) to address the homograph issue.

    So, I repeat my question: has a unique grapheme system been tried somewhere?

    I look for your comments.

    On 09:59 04/10/2006, Stephane Bortzmeyer said:

    >On Tue, Oct 03, 2006 at 10:22:33PM +0200,
    > Jefsey_Morfin <> wrote
    > > RFC 4690 documents a certain number of difficulties resulting of the
    > > choice of Unicode as the reference table of the punycode process.
    >Not at all (I mention this for the people who still read Jefsey's
    >The RFC is available here:
    >and does not discuss the choice of Unicode. Quite the contrary:
    >4.1.6. Use of the Unicode Character Set in the IETF
    > Unicode and the closely-related ISO 10646 are the only coded
    > character sets that aspire to include all of the world's characters.
    > As such, they permit use of international characters without having
    > to identify particular character coding standards or tables. The
    > requirement for a single character set is particularly important for
    > use with the DNS since there is no place to put character set
    > identification. The decision to use Unicode as the base for IETF
    > protocols going forward is discussed in [RFC2277]. The IAB does not
    > see any reason to revisit the decision to use Unicode in IETF
    > protocols.

    This archive was generated by hypermail 2.1.5 : Wed Oct 04 2006 - 09:50:29 CST