RE: Specification for XID_Start and XID_Continue

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Tue Aug 14 2007 - 17:13:14 CDT

  • Next message: Asmus Freytag: "Re: Specification for XID_Start and XID_Continue"

    This may be the algorithm used to generate the XID properties, but do have
    an implementation to hardcode this instead of using the derived properties
    file or at least some external source?

    My opinion is that your code should contain a function to load an external
    resource at init time, and then the isXidStart() function will use the
    content of the set loaded before during init.

    For example in Java this code below assumes that :
    * character that are XID_Start must also be XID_Continue ;
    * characters that are NOT XID_Continue must also be NOT XID_Start.
    Note the inversed sentences.

    Such assumptions should be true also for ID_Start and ID_Continue.

    It currently does not load the vectors from an external source but performs
    the static initialization separately.
    public class UChar implements Number {
        /** holds the internal code point of the UChar. */
        protected int codePoint;
        //...
        /** Documented UChar method. */
        public final bool isXidContinue () {
            return this.isInSortedVect(UChar.sortedVectXidContinue) ||
                ! this.isInSortedVect(UChar.sortedVectXidNotContinue) &&
                this.isIdContinue ();
        }
        /** Documented UChar method. */
        public final bool isXidStart () {
            return this.isInSortedVect(UChar.sortedVectXidStart) ||
                this.isInSortedVect(UChar.sortedVectXidContinue) ||
                ! this.isInSortedVect(UChar.sortedVectXidNotContinue) &&
                ! this.isInSortedVect(UChar.sortedVectXidNotStart) &&
                this.isIdStart();
        }
        /** Utility method, dichotomic search in a sorted vector of
         * code points. */
        protected final bool isInSortedVect(final int[] sortedVect) {
            int lookup = this.codePoint;
            int firstIndex = 0, lastIndex = sortedVect.length - 1;
            while (firstIndex <= lastIndex) {
                int middleIndex = (firstIndex + lastIndex) / 2;
                int check = sortedVect[middleIndex];
                if (check < lookup)
                    firstIndex = middleIndex + 1;
                else if (check == lookup) {
                    return true;
                else
                    lastIndex = middleIndex - 1;
            }
            return false;
        }
        /** Internal sorted vectors for dichotomic lookups. */
        protected static int[] sortedVectXidStart;
        protected static int[] sortedVectXidContinue;
        protected static int[] sortedVectXidNotContinue;
        protected static int[] sortedVectXidNotStart;
        // static initialization code (should take an external resource)
        {
            // the four vectors do not have any common code point.
            sortedVectXidStart = new int[] {
                // Empty for now.
                };
            sortedVectXidContinue = new int[] {
                // XID_Continue implies XID_Start.
                0x00B7};
            sortedVectXidNotContinue = new int[] {
                // Must not contain any of the code points above.
                0x037A, 0x309B, 0x309C, 0xFC5E,
                0xFC5F, 0xFC60, 0xFC61, 0xFC62,
                0xFC63, 0xFDFA, 0xFDFB, 0xFE70,
                0xFE72, 0xFE74, 0xFE76, 0xFE78,
                0xFE7A, 0xFE7C, 0xFE7E};
            sortedVectXidNotStart = new int[] {
                // XID_NotStart implies XID_NotContinue.
                0x0E33, 0x0EB3, 0xFF9E, 0xFF9F};
        }
    }

    Philippe.

    (Sorry for packing your code but line breaks were occurring at wrong places,
    it just uses there a reformat from standard Java conventions).

    Mike wrote:
    > I ran into the same problem, and didn't really find an answer. In my
    > code, I ended up with this:
    >
    > inline bool Char::IsXidStart () const {
    > switch (u_Char_) { // u_Char_ holds the code point of the Char
    > case 0x037A: case 0x0E33: case 0x0EB3: case 0x309B: case 0x309C:
    > case 0xFC5E: case 0xFC5F: case 0xFC60: case 0xFC61: case 0xFC62:
    > case 0xFC63: case 0xFDFA: case 0xFDFB: case 0xFE70: case 0xFE72:
    > case 0xFE74: case 0xFE76: case 0xFE78: case 0xFE7A: case 0xFE7C:
    > case 0xFE7E: case 0xFF9E: case 0xFF9F:
    > return false;
    > }
    > return IsIdStart();
    > }
    > inline bool Char::IsXidContinue () const {
    > switch (u_Char_) {
    > case 0xB7:
    > return true;
    > case 0x037A: case 0x309B: case 0x309C: case 0xFC5E: case 0xFC5F:
    > case 0xFC60: case 0xFC61: case 0xFC62: case 0xFC63: case 0xFDFA:
    > case 0xFDFB: case 0xFE70: case 0xFE72: case 0xFE74: case 0xFE76:
    > case 0xFE78: case 0xFE7A: case 0xFE7C: case 0xFE7E:
    > return false;
    > }
    > return IsIdContinue();
    > }



    This archive was generated by hypermail 2.1.5 : Tue Aug 14 2007 - 17:15:36 CDT