Word Dividers And The Terminal_Punctuation Property

From: Ernest Cline (ernestcline@mindspring.com)
Date: Tue Apr 13 2004 - 21:57:52 EDT

  • Next message: Gary P. Grosso: "help finding radical/stroke index at unicode.org"

    I realize that Terminal_Punctuation is only an informative property,
    but I have a question concerning it and characters that the Line
    Breaking Algorithm identifies as being word dividers.

    In UAX #14 the following info is given in the list of characters of Line
    Break class BA:

      Other forms of visible word dividers that provide break opportunities.

      0F0B TIBETAN MARK INTERSYLLABIC TSHEG
      1361 ETHIOPIC WORDSPACE
      17D5 KHMER SIGN BARIYOOSAN
      10100 AEGEAN WORD SEPARATOR LINE
      10101 AEGEAN WORD SEPARATOR DOT
      10102 AEGEAN CHECK MARK
      1039F UGARITIC WORD DIVIDER

    Of these seven characters, only two, U+1361 and U+17D5 have
    the Terminal_Punctuation property. One of these, U+10102 is
    a symbol and thus is not punctuation, but what is the distinction
    that causes the other four to not also have the Terminal_Punctuation
    property? Is it because Terminal_Punctuation is informative
    that these other four have slipped thru the cracks, or is there
    a reason I should be noticing, but am not?



    This archive was generated by hypermail 2.1.5 : Tue Apr 13 2004 - 22:52:58 EDT