L2/04-083

The Script property for 4.0 characters

Eric Muller, Adobe Systems Inc.
February 2, 2004

Document History

In looking at the script property for the combining characters, I noticed a couple of strange things:

Looking at bit more carefully, I noticed the following pattern: all the combining characters with the script COMMON are new in Unicode 4.0; conversely, from the combining characters new in 4.0, the Gujarati and Limbu ones have their respective scripts while all the others have the COMMON script.

Add the fact that COMMON is the script for characters not listed explicitely in Scripts.txt, and I believe that we essentially forgot to assign the script property for most 4.0 combining characters, Checking a bit further, I believe that statement extends to base characters as well (although not every instance of COMMON is highly suspiscious for those).

I don’t know if we forgot to do the work, or if we forgot or lost the update to Scripts.txt. It may be worth tracking what did or did not happen, so as to fix our process.

Here is an attempt to repare this. These are all the 4.0 characters with the COMMON script, together with a proposed change if needed. I based the proposed assignments on similarity with pre-4.0 characters, as noted.

  1. 02EF Sk MODIFIER LETTER LOW DOWN ARROWHEAD
    02F0 Sk MODIFIER LETTER LOW UP ARROWHEAD
    02F1 Sk MODIFIER LETTER LOW LEFT ARROWHEAD
    02F2 Sk MODIFIER LETTER LOW RIGHT ARROWHEAD
    02F3 Sk MODIFIER LETTER LOW RING
    02F4 Sk MODIFIER LETTER MIDDLE GRAVE ACCENT
    02F5 Sk MODIFIER LETTER MIDDLE DOUBLE GRAVE ACCENT
    02F6 Sk MODIFIER LETTER MIDDLE DOUBLE ACUTE ACCENT
    02F7 Sk MODIFIER LETTER LOW TILDE
    02F8 Sk MODIFIER LETTER RAISED COLON
    02F9 Sk MODIFIER LETTER BEGIN HIGH TONE
    02FA Sk MODIFIER LETTER END HIGH TONE
    02FB Sk MODIFIER LETTER BEGIN LOW TONE
    02FC Sk MODIFIER LETTER END LOW TONE
    02FD Sk MODIFIER LETTER SHELF
    02FE Sk MODIFIER LETTER OPEN SHELF
    02FF Sk MODIFIER LETTER LOW LEFT ARROW
          

    Those are probably ok, they match the assignments for U+02B9..U+02DF

  2. 0350 Mn COMBINING RIGHT ARROWHEAD ABOVE
    0351 Mn COMBINING LEFT HALF RING ABOVE
    0352 Mn COMBINING FERMATA
    0353 Mn COMBINING X BELOW
    0354 Mn COMBINING LEFT ARROWHEAD BELOW
    0355 Mn COMBINING RIGHT ARROWHEAD BELOW
    0356 Mn COMBINING RIGHT ARROWHEAD AND UP ARROWHEAD BELOW
    0357 Mn COMBINING RIGHT HALF RING ABOVE
    035D Mn COMBINING DOUBLE BREVE
    035E Mn COMBINING DOUBLE MACRON
    035F Mn COMBINING DOUBLE MACRON BELOW
          

    INHERITED, to match the other characters in the Combinining Diacritical Marks block.

  3. 0600 Cf ARABIC NUMBER SIGN
    0601 Cf ARABIC SIGN SANAH
    0602 Cf ARABIC FOOTNOTE MARKER
    0603 Cf ARABIC SIGN SAFHA
          

    INHERITED, to match U+06DD ARABIC END OF AYAH, the only pre-4.0 Cf character in the Arabic block.

  4. 060D Po ARABIC DATE SEPARATOR
    060E So ARABIC POETIC VERSE SIGN
    060F So ARABIC SIGN MISRA
          

    COMMON is probably ok, to match the other Po and So in the Arabic block.

  5. 0610 Mn ARABIC SIGN SALLALLAHOU ALAYHE WASSALLAM
    0611 Mn ARABIC SIGN ALAYHE ASSALLAM
    0612 Mn ARABIC SIGN RAHMATULLAH ALAYHE
    0613 Mn ARABIC SIGN RADI ALLAHOU ANHU
    0614 Mn ARABIC SIGN TAKHALLUS
    0615 Mn ARABIC SMALL HIGH TAH
    0656 Mn ARABIC SUBSCRIPT ALEF
    0657 Mn ARABIC INVERTED DAMMA
    0658 Mn ARABIC MARK NOON GHUNNA
          

    INHERITED to match the other combining characters in the Arabic block.

  6. 0A01 Mn GURMUKHI SIGN ADAK BINDI
          

    GURMUKHI to match the other GURMUHKI combining characters.

  7. 0AF1 Sc GUJARATI RUPEE SIGN
          

    COMMON to match U+09F3 BENGALI RUPEE SIGN

  8. 0BF3 So TAMIL DAY SIGN
    0BF4 So TAMIL MONTH SIGN
    0BF5 So TAMIL YEAR SIGN
    0BF6 So TAMIL DEBIT SIGN
    0BF7 So TAMIL CREDIT SIGN
    0BF8 So TAMIL AS ABOVE SIGN
    0BF9 Sc TAMIL RUPEE SIGN
    0BFA So TAMIL NUMBER SIGN
          

    COMMON to match U+09F3 BENGALI RUPEE SIGN

  9. 0BF3 So TAMIL DAY SIGN
    0BF4 So TAMIL MONTH SIGN
    0BF5 So TAMIL YEAR SIGN
    0BF6 So TAMIL DEBIT SIGN
    0BF7 So TAMIL CREDIT SIGN
    0BF8 So TAMIL AS ABOVE SIGN
    0BFA So TAMIL NUMBER SIGN
          

    Not sure.

  10. 0CBC Mn KANNADA SIGN NUKTA
          

    KANNADA, to match the other Indic nuktas

  11. 17DD Mn KHMER SIGN ATTHACAN
          

    KHMER to match the other Khmer combining characters

  12. 17F0 No KHMER SYMBOL LEK ATTAK SON
    17F1 No KHMER SYMBOL LEK ATTAK MUOY
    17F2 No KHMER SYMBOL LEK ATTAK PII
    17F3 No KHMER SYMBOL LEK ATTAK BEI
    17F4 No KHMER SYMBOL LEK ATTAK BUON
    17F5 No KHMER SYMBOL LEK ATTAK PRAM
    17F6 No KHMER SYMBOL LEK ATTAK PRAM-MUOY
    17F7 No KHMER SYMBOL LEK ATTAK PRAM-PII
    17F8 No KHMER SYMBOL LEK ATTAK PRAM-BEI
    17F9 No KHMER SYMBOL LEK ATTAK PRAM-BUON
          

    COMMON to match U+17D7 KHMER SIGN LEK TOO.

  13. 1940 So LIMBU SIGN LOO
          

    Not sure

  14. 1944 Po LIMBU EXCLAMATION MARK
    1945 Po LIMBU QUESTION MARK
          

    COMMON to match the other xxx QUESTION/EXCLAMATION MARK

  15. 19E0 So KHMER SYMBOL PATHAMASAT
    19E1 So KHMER SYMBOL MUOY KOET
    19E2 So KHMER SYMBOL PII KOET
    19E3 So KHMER SYMBOL BEI KOET
    19E4 So KHMER SYMBOL BUON KOET
    19E5 So KHMER SYMBOL PRAM KOET
    19E6 So KHMER SYMBOL PRAM-MUOY KOET
    19E7 So KHMER SYMBOL PRAM-PII KOET
    19E8 So KHMER SYMBOL PRAM-BEI KOET
    19E9 So KHMER SYMBOL PRAM-BUON KOET
    19EA So KHMER SYMBOL DAP KOET
    19EB So KHMER SYMBOL DAP-MUOY KOET
    19EC So KHMER SYMBOL DAP-PII KOET
    19ED So KHMER SYMBOL DAP-BEI KOET
    19EE So KHMER SYMBOL DAP-BUON KOET
    19EF So KHMER SYMBOL DAP-PRAM KOET
    19F0 So KHMER SYMBOL TUTEYASAT
    19F1 So KHMER SYMBOL MUOY ROC
    19F2 So KHMER SYMBOL PII ROC
    19F3 So KHMER SYMBOL BEI ROC
    19F4 So KHMER SYMBOL BUON ROC
    19F5 So KHMER SYMBOL PRAM ROC
    19F6 So KHMER SYMBOL PRAM-MUOY ROC
    19F7 So KHMER SYMBOL PRAM-PII ROC
    19F8 So KHMER SYMBOL PRAM-BEI ROC
    19F9 So KHMER SYMBOL PRAM-BUON ROC
    19FA So KHMER SYMBOL DAP ROC
    19FB So KHMER SYMBOL DAP-MUOY ROC
    19FC So KHMER SYMBOL DAP-PII ROC
    19FD So KHMER SYMBOL DAP-BEI ROC
    19FE So KHMER SYMBOL DAP-BUON ROC
    19FF So KHMER SYMBOL DAP-PRAM ROC
          

    Note sure

  16. 2053 Po SWUNG DASH
    2054 Pc INVERTED UNDERTIE
    213B So FACSIMILE SIGN
    23CF So EJECT SYMBOL
    23D0 So VERTICAL LINE EXTENSION
    24FF No NEGATIVE CIRCLED DIGIT ZERO
    2614 So UMBRELLA WITH RAIN DROPS
    2615 So HOT BEVERAGE
          

    COMMON is fine

  17. 268A So MONOGRAM FOR YANG
    268B So MONOGRAM FOR YIN
    268C So DIGRAM FOR GREATER YANG
    268D So DIGRAM FOR LESSER YIN
    268E So DIGRAM FOR LESSER YANG
    268F So DIGRAM FOR GREATER YIN
          

    Not sure

  18. 2690 So WHITE FLAG
    2691 So BLACK FLAG
    26A0 So WARNING SIGN
    26A1 So HIGH VOLTAGE SIGN
    2B00 So NORTH EAST WHITE ARROW
    2B01 So NORTH WEST WHITE ARROW
    2B02 So SOUTH EAST WHITE ARROW
    2B03 So SOUTH WEST WHITE ARROW
    2B04 So LEFT RIGHT WHITE ARROW
    2B05 So LEFTWARDS BLACK ARROW
    2B06 So UPWARDS BLACK ARROW
    2B07 So DOWNWARDS BLACK ARROW
    2B08 So NORTH EAST BLACK ARROW
    2B09 So NORTH WEST BLACK ARROW
    2B0A So SOUTH EAST BLACK ARROW
    2B0B So SOUTH WEST BLACK ARROW
    2B0C So LEFT RIGHT BLACK ARROW
    2B0D So UP DOWN BLACK ARROW
          

    COMMON is fine

  19. 321D So PARENTHESIZED KOREAN CHARACTER OJEON
    321E So PARENTHESIZED KOREAN CHARACTER O HU
    3250 So PARTNERSHIP SIGN
    327C So CIRCLED KOREAN CHARACTER CHAMKO
    327D So CIRCLED KOREAN CHARACTER JUEUI
    32CC So SQUARE HG
    32CD So SQUARE ERG
    32CE So SQUARE EV
    32CF So LIMITED LIABILITY SIGN
          

    COMMON matches all the other characters in that block.

  20. 3377 So SQUARE DM
    3378 So SQUARE DM SQUARED
    3379 So SQUARE DM CUBED
    337A So SQUARE IU
    33DE So SQUARE V OVER M
    33DF So SQUARE A OVER M
    33FF So SQUARE GAL
          

    COMMON matches all the other characters in that block.

  21. 4DC0 So HEXAGRAM FOR THE CREATIVE HEAVEN
    ...
    4DFF So HEXAGRAM FOR BEFORE COMPLETION
          

    New script?

  22. FDFD So ARABIC LIGATURE BISMILLAH AR-RAHMAN AR-RAHEEM
          

    ARABIC, to match the other U+FDFx ligatures (except U+FDFC RIAL SIGN, which is COMMON like the other currencies)

  23. FE47 Ps PRESENTATION FORM FOR VERTICAL LEFT SQUARE BRACKET
    FE48 Pe PRESENTATION FORM FOR VERTICAL RIGHT SQUARE BRACKET
          

    COMMON matches the rest of the block

  24. 10100 Po AEGEAN WORD SEPARATOR LINE
    ...
    1013F So AEGEAN MEASURE THIRD SUBUNIT
          

    New script?

  25. 1039F common Po 4.0 UGARITIC WORD DIVIDER
          

    COMMON matches other punctutations.

  26. 1D300 So MONOGRAM FOR EARTH
    ...
    1D356 So TETRAGRAM FOR FOSTERING
          

    New script?

  27. 1D4C1 Ll MATHEMATICAL SCRIPT SMALL L
          

    COMMON matches the other MATHEMATICAL characters.

  28. E0100 Mn VARIATION SELECTOR-17
    ...
    E01EF Mn VARIATION SELECTOR-256
          

    INHERITED, to match the BMP VARIATION SELECTOR-xx characters

Assuming that the overall assessment is correct, and given the scope of the changes, I would strongly recommend an independent verification (e.g., to make sure I did not drop some 4.0 character in preparing this document).


Document History

Author: Eric Muller

RevisionDateComments
1February 2, 2004

First version