Re: Specification for XID_Start and XID_Continue

From: Asmus Freytag (
Date: Tue Aug 14 2007 - 14:30:30 CDT

  • Next message: Martin v. L÷wis: "Re: Specification for XID_Start and XID_Continue"

    On 8/14/2007 9:50 AM, Martin v. L÷wis wrote:
    > I'm trying to locate the precise specification for the
    > XID_Start and XID_Continue properties. According to
    > they are derived properties, so there should be an
    > algorithm somewhere describing how the are computed
    > (given other properties). The UCD says that the
    > specification is in UAX#31, which says I should
    > read
    > However, looking at 5.1, I cannot find a precise
    > specification of these properties. For example,
    > 5.1.2 says "Certain characters...", but does not
    > seem to provide a complete list of such characters.
    > It ends with "In particular, the following four
    > characters...". Again, that reads like an example -
    > is it meant as a complete specification?
    > Likewise, 5.1.3 talks about "certain Arabic presentation
    > forms", without giving a complete list which precisely
    > are excluded from XID_Start and XID_Continue.
    > Any insights appreciated,
    I think the algorithm you are looking for is given by the requirement that

    IsIdentifier(S) == IsIdentifier(NFKx(S))

    and the desire to not add characters to ID_CONTINUE that impact
    processing of identifiers, except where really necessary (middle dot).

    I glean this as the algorithm:

    Add middle dot to ID_CONTINUE

    If an ID_START or ID_CONTINUE character has a decomposition containing a
    character other than middle dot that's not in ID_CONTINUE, then remove
    that character from ID_START or ID_CONTINUE.

    If an ID_START has a decomposition that begins with a character that's
    not an ID_START, remove it from ID_START.


    > Martin

    This archive was generated by hypermail 2.1.5 : Tue Aug 14 2007 - 14:32:50 CDT