Extender characters, Production 89 of XML 1.0

From: Elliotte Rusty Harold (elharo@metalab.unc.edu)
Date: Mon Jan 11 1999 - 10:46:14 EST

Production [89] of the XML 1.0 specification lists the characters that are
considered to be "extenders".

[89] Extender ::= #x00B7 | #x02D0 | #x02D1 | #x0387 | #x0640 | #x0E46 |
#x0EC6 | #x3005 | [#x3031-#x3035] | [#x309D-#x309E] | [#x30FC-#x30FE]

In order these characters are the middle dot, the modifier letter
triangular colon, the modifier letter half, triangular colon, the Greek
middle dot, Arabic tatweel, the Thai maiyamok, the Lao ko la, the
ideographic iteration mark, five Japanese Kana repeat marks, Japanese
Hiragana iteration mark and voiced iteration mark, and the Japanese
Katakana and Hiragana sound mark and prolonged sound mark. (#x0387, the
triangular colon, has been removed from the extender class in the latest
Unicode errata sheet, but this has not yet trickled down into XML.) In XML
these characters can be used anywhere a base character or ideographic
character can be used.

However I have been unable to find in the Unicode book or Web site any
definition of what makes a character an extender. Can anyone clue me in on
why some Unicode characters have the extender property while others don't?
What's the logic behinsd this grouping of characters across languages?

| Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer |
| XML: Extensible Markup Language (IDG Books 1998) |
| http://www.amazon.com/exec/obidos/ISBN=0764531999/cafeaulaitA/ |
| Read Cafe au Lait for Java news: http://sunsite.unc.edu/javafaq/ |
| Read Cafe con Leche for XML news: http://sunsite.unc.edu/xml/ |

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:43 EDT