Re: valid characters in user names- esp. compatibility characters

From: Tex Texin (tex@i18nguy.com)
Date: Thu Aug 12 2004 - 03:11:00 CDT

  • Next message: Anto'nio Martins-Tuva'lkin: "Re: Combining across markup?"

    thanks Addison.
    I agree that stringprep and NFKC aren't suitable for all apps, which is why I
    am being cautious with this usage. But it seems to fit the bill given the
    legacy constraints, provided that compatibility characters do make distinctions
    that are important to user's names. I don't think they do, but I thought I
    would check with the experts.

    thanks for the comments!
    tex

    "Addison Phillips [wM]" wrote:
    >
    > Hi Tex,
    >
    > webMethods has used a (slightly modified) version of punycode for handling generated class names in Java in several products very successfully for several years now. The slight modification is to subtitute underscore for the dash character (since one is illegal in Java class names). Punycode has proven to be exceedingly robust for this type of application, although the algorithm is very arcane.
    >
    > Our ACE coder doesn't directly impose NFKC or any of the stringprep type preparations. In our application of ACEs users create objects visually and we generate Java code named after the objects in a process invisible to users. Although NFKC and stringprep are reasonable restrictions for IDN, with its peculiar requirements, it doesn't follow that it is good for all applications. Punycode (and all other ACEs) are essentially transfer encoding schemes for Unicode code points. The ASCII sequences they generate are unique to any particular Unicode scalar sequence.
    >
    > It's true that logins have many similarities to IDN in terms of requirements, though. Just note that there is no reason why an internal algorithm *has* to do both stringprep and punycode or has to do stringprep in the IDN way...
    >
    > I have a whitepaper on the subject which expands (a tiny amount) on webMethods use of ACEs that was presented at IUCs twice, the last one being at Unicode 22, called "Four ACEs: A Survey of ASCII Compatible Encodings". The PDF is on my personal website http://www.inter-locale.com. I can't remember, but I think this one was a substitute paper at IUC22, so it probably isn't in the program proceedings.
    >
    > Hope this helps,
    >
    > Addison
    >
    > Addison P. Phillips
    > Director, Globalization Architecture
    > webMethods | Delivering Global Business Visibility
    > http://www.webMethods.com
    > Chair, W3C Internationalization (I18N) Working Group
    > Chair, W3C-I18N-WG, Web Services Task Force
    > http://www.w3.org/International
    >
    > Internationalization is an architecture.
    > It is not a feature.
    >
    > > -----Original Message-----
    > > From: unicode-bounce@unicode.org
    > > [mailto:unicode-bounce@unicode.org]On Behalf Of Tex Texin
    > > Sent: 2004年8月11日 18:29
    > > To: Unicoders
    > > Subject: valid characters in user names- esp. compatibility characters
    > >
    > >
    > > hi,
    > >
    > > 1) I am looking at a set of legacy applications that would like
    > > to extend user
    > > IDs to support international characters.
    > > It is not possible to update all of the applications
    > > simultaneously to fully
    > > support unicode, so I am considering an algorithmic mapping of the
    > > international IDs to an ASCII-based encoding and a layering similar to how
    > > domain names were extended to be international.
    > >
    > > However, I am curious as to whether some Users might read/write
    > > their names
    > > using compatibility characters (esp. in ideographic markets) and
    > > object to the
    > > characters being normalized through nfkc. I thought it might be
    > > like someone
    > > spelling their name incorrectly. I don't know enough about
    > > ideographic names or
    > > the compat. characters to evaluate if it would be perceived as a
    > > problem by
    > > users. If any CJK experts would comment on this, it would be appreciated.
    > >
    > > 2) I am also getting questions about the robustness and stability
    > > of the GNU
    > > libidn implementations of stringprep and punycode which are being
    > > considered. I
    > > would be glad to hear privately if you have used them and what
    > > your experience
    > > was/is.
    > >
    > > tia
    > > tex
    > >
    > > --
    > > -------------------------------------------------------------
    > > Tex Texin cell: +1 781 789 1898 mailto:Tex@XenCraft.com
    > > Xen Master http://www.i18nGuy.com
    > >
    > > XenCraft http://www.XenCraft.com
    > > Making e-Business Work Around the World
    > > -------------------------------------------------------------
    > >

    -- 
    -------------------------------------------------------------
    Tex Texin   cell: +1 781 789 1898   mailto:Tex@XenCraft.com
    Xen Master                          http://www.i18nGuy.com
                             
    XenCraft		            http://www.XenCraft.com
    Making e-Business Work Around the World
    -------------------------------------------------------------
    


    This archive was generated by hypermail 2.1.5 : Thu Aug 12 2004 - 03:13:14 CDT