RE: valid characters in user names- esp. compatibility characters

From: Addison Phillips [wM] (
Date: Wed Aug 11 2004 - 22:42:37 CDT

  • Next message: Tex Texin: "Re: valid characters in user names- esp. compatibility characters"

    Hi Tex,

    webMethods has used a (slightly modified) version of punycode for handling generated class names in Java in several products very successfully for several years now. The slight modification is to subtitute underscore for the dash character (since one is illegal in Java class names). Punycode has proven to be exceedingly robust for this type of application, although the algorithm is very arcane.

    Our ACE coder doesn't directly impose NFKC or any of the stringprep type preparations. In our application of ACEs users create objects visually and we generate Java code named after the objects in a process invisible to users. Although NFKC and stringprep are reasonable restrictions for IDN, with its peculiar requirements, it doesn't follow that it is good for all applications. Punycode (and all other ACEs) are essentially transfer encoding schemes for Unicode code points. The ASCII sequences they generate are unique to any particular Unicode scalar sequence.

    It's true that logins have many similarities to IDN in terms of requirements, though. Just note that there is no reason why an internal algorithm *has* to do both stringprep and punycode or has to do stringprep in the IDN way...

    I have a whitepaper on the subject which expands (a tiny amount) on webMethods use of ACEs that was presented at IUCs twice, the last one being at Unicode 22, called "Four ACEs: A Survey of ASCII Compatible Encodings". The PDF is on my personal website I can't remember, but I think this one was a substitute paper at IUC22, so it probably isn't in the program proceedings.

    Hope this helps,


    Addison P. Phillips
    Director, Globalization Architecture
    webMethods | Delivering Global Business Visibility
    Chair, W3C Internationalization (I18N) Working Group
    Chair, W3C-I18N-WG, Web Services Task Force

    Internationalization is an architecture.
    It is not a feature.

    > -----Original Message-----
    > From:
    > []On Behalf Of Tex Texin
    > Sent: 2004年8月11日 18:29
    > To: Unicoders
    > Subject: valid characters in user names- esp. compatibility characters
    > hi,
    > 1) I am looking at a set of legacy applications that would like
    > to extend user
    > IDs to support international characters.
    > It is not possible to update all of the applications
    > simultaneously to fully
    > support unicode, so I am considering an algorithmic mapping of the
    > international IDs to an ASCII-based encoding and a layering similar to how
    > domain names were extended to be international.
    > However, I am curious as to whether some Users might read/write
    > their names
    > using compatibility characters (esp. in ideographic markets) and
    > object to the
    > characters being normalized through nfkc. I thought it might be
    > like someone
    > spelling their name incorrectly. I don't know enough about
    > ideographic names or
    > the compat. characters to evaluate if it would be perceived as a
    > problem by
    > users. If any CJK experts would comment on this, it would be appreciated.
    > 2) I am also getting questions about the robustness and stability
    > of the GNU
    > libidn implementations of stringprep and punycode which are being
    > considered. I
    > would be glad to hear privately if you have used them and what
    > your experience
    > was/is.
    > tia
    > tex
    > --
    > -------------------------------------------------------------
    > Tex Texin cell: +1 781 789 1898
    > Xen Master
    > XenCraft
    > Making e-Business Work Around the World
    > -------------------------------------------------------------

    This archive was generated by hypermail 2.1.5 : Wed Aug 11 2004 - 22:47:30 CDT