L2/04-352

Saurashtra Linebreaking and Other Properties

Source: Rick McGowan on behalf of UTC
Date:September 15, 2004

This is to accompany document ISO/IEC JTC1/SC2/WG2 N2549 (L2/03-098) Proposal to encode the Saurashtra script in the UCS by Michael Everson and Jeyakumar Chinnakkonda Krishnamoorty.

The Saurashtra characters have properties similar to Devanagari and the other Indic scripts. These properties for UnicodeData.txt are covered in L2/03-225.

Linebreaking Property Values

The table below shows the values for the Linebreaking property, as they would be defined in the LineBreak.txt file of the UCD.

1x0000;CM # SAURASHTRA SIGN ANUSVARA
1x0001;CM # SAURASHTRA SIGN VISARGA
1x0002;AL # ASHTRA LETTER A
1x0003;AL # ASHTRA LETTER AA
1x0004;AL # ASHTRA LETTER I
1x0005;AL # ASHTRA LETTER II
1x0006;AL # ASHTRA LETTER U
1x0007;AL # ASHTRA LETTER UU
1x0008;AL # ASHTRA LETTER VOCALIC R
1x0009;AL # ASHTRA LETTER VOCALIC RR
1x000A;AL # ASHTRA LETTER VOCALIC L
1x000B;AL # ASHTRA LETTER VOCALIC LL
1x000C;AL # ASHTRA LETTER E
1x000D;AL # ASHTRA LETTER EE
1x000E;AL # ASHTRA LETTER AI
1x000F;AL # ASHTRA LETTER O
1x0010;AL # ASHTRA LETTER OO
1x0011;AL # ASHTRA LETTER AU
1x0012;AL # ASHTRA LETTER KA
1x0013;AL # ASHTRA LETTER KHA
1x0014;AL # ASHTRA LETTER GA
1x0015;AL # ASHTRA LETTER GHA
1x0016;AL # ASHTRA LETTER NGA
1x0017;AL # ASHTRA LETTER CA
1x0018;AL # ASHTRA LETTER CHA
1x0019;AL # ASHTRA LETTER JA
1x001A;AL # ASHTRA LETTER JHA
1x001B;AL # ASHTRA LETTER NYA
1x001C;AL # ASHTRA LETTER TTA
1x001D;AL # ASHTRA LETTER TTHA
1x001E;AL # ASHTRA LETTER DDA
1x001F;AL # ASHTRA LETTER DDHA
1x0020;AL # ASHTRA LETTER NNA
1x0021;AL # ASHTRA LETTER TA
1x0022;AL # ASHTRA LETTER THA
1x0023;AL # ASHTRA LETTER DA
1x0024;AL # ASHTRA LETTER DHA
1x0025;AL # ASHTRA LETTER NA
1x0026;AL # ASHTRA LETTER PA
1x0027;AL # ASHTRA LETTER PHA
1x0028;AL # ASHTRA LETTER BA
1x0029;AL # ASHTRA LETTER BHA
1x002A;AL # ASHTRA LETTER MA
1x002B;AL # ASHTRA LETTER YA
1x002C;AL # ASHTRA LETTER RA
1x002D;AL # ASHTRA LETTER LA
1x002E;AL # ASHTRA LETTER VA
1x002F;AL # ASHTRA LETTER SHA
1x0030;AL # ASHTRA LETTER SSA
1x0031;AL # ASHTRA LETTER SA
1x0032;AL # ASHTRA LETTER HA
1x0033;AL # ASHTRA LETTER LLA

1x0035;CM # ASHTRA VOWEL SIGN AA
1x0036;CM # ASHTRA VOWEL SIGN I
1x0037;CM # ASHTRA VOWEL SIGN II
1x0038;CM # ASHTRA VOWEL SIGN U
1x0039;CM # ASHTRA VOWEL SIGN UU
1x003A;CM # ASHTRA VOWEL SIGN VOCALIC R
1x003B;CM # ASHTRA VOWEL SIGN VOCALIC RR
1x003C;CM # ASHTRA VOWEL SIGN VOCALIC L
1x003D;CM # ASHTRA VOWEL SIGN VOCALIC LL
1x003F;CM # ASHTRA VOWEL SIGN E
1x0040;CM # ASHTRA VOWEL SIGN EE
1x0041;CM # ASHTRA VOWEL SIGN AI
1x0042;CM # ASHTRA VOWEL SIGN O
1x0043;CM # ASHTRA VOWEL SIGN OO
1x0044;CM # ASHTRA VOWEL SIGN AU
1x0045;CM # ASHTRA SIGN VIRAMA

1x0046;NU # ASHTRA DIGIT ZERO
1x0047;NU # ASHTRA DIGIT ONE
1x0048;NU # ASHTRA DIGIT TWO
1x0049;NU # ASHTRA DIGIT THREE
1x004A;NU # ASHTRA DIGIT FOUR
1x004B;NU # ASHTRA DIGIT FIVE
1x004C;NU # ASHTRA DIGIT SIX
1x004D;NU # ASHTRA DIGIT SEVEN
1x004E;NU # ASHTRA DIGIT EIGHT
1x004F;NU # ASHTRA DIGIT NINE

If the character shown as proposed for 1x0500 in the Saurashtra proposal L2/03-098 were to be encoded, it would have linebreaking property as shown below:

1X0050;AL # unidentified Saurashtra letter

Other Property Values (Proplist.txt)

The Script property value for all of the characters in the proposal (1x0000 through 1x004F) should be "Saurashtra", and that script value should be added to UAX #24.

1x0000..1x0001 should have the Other_Alphabetic property (Proplist.txt), like U+0903.

1x0044 should have the Diacritic property (Proplist.txt), like U+094D.

No punctuation characters are specified in the proposal.

Derived Property Values

1x0002..1x0033 are all Alphabetic (Letter Other, Lo), and should end up with the derived property of Grapheme_Base.

1x0035..1x0044 are all CM and should end up with the derived property Grapheme_Extend.

There are 10 decimal digits 1x0046..1x004F, which should be checked to make sure they end up with the right numeric and/or math properties.