CLDR ExemplarCharacters Data for Identity Management Data Validation Rules

From: Thierry Moreau (
Date: Thu May 07 2009 - 15:08:22 CDT

  • Next message: Doug Ewell: "Re: Rendering of Candrabindhu & Visarga Dual Combination in Indic Scripts"

    This post is a general question about the design of validation logic for
    an identity management application.

    This is somehow related to IDN validity rules, but with slightly
    different application requirements.

    In UTR#36 (Unicode Security Considerations) Annex G (Language-Based
    Security) was published a few days after CLDDR version 1.4 was released,
    in 2006-07. The text of this annex recommends the use of Unicode scripts
    as a basis for name validation rules, and recommends writing systems
    instead of languages as a refined strategy.

    In the meantime, the CLDR project moved to version 1.6 (and 1.6.1) and
    improved "data on language and script usage" (presumably this covers

    The main question is whether UTR#36 / Annex G advice *against* using
    CLDR data for validation rules (e.g. for security-aware applications
    e.g. where identity spoofing is a threat) has been revisited by someone.

    So far, my investigations along these lines indicate that it should be
    feasible to combine Unicode script information and CLDR
    exemplarCharacters data with a lot of adjustments (e.g. to remove
    historic or phonetic scripts) to come up with language-specific rules
    for what is an acceptable identity in a given language (actually the
    rules may apply to personal identification data elements such as place
    of birth). Obviously, such validation applies to normalized strings.

    Any comment or suggestion?

    Thanks in advance.

    - Thierry Moreau
    CONNOTECH Experts-conseils inc.
    9130 Place de Montgolfier
    Montreal, Qc
    Canada   H2M 2A1
    Tel.: (514)385-5691
    Fax:  (514)385-5900
    web site:

    This archive was generated by hypermail 2.1.5 : Thu May 07 2009 - 15:17:42 CDT