From: Christopher Fynn (cfynn@gmx.net)
Date: Fri Jan 14 2005 - 17:49:53 CST
Peter Constable wrote:
> The block description in TUS4 Chapter 9 for Tibetan (p255) says,
> " Tibetan script has two break characters only. The primary break
> character is the standard interword tsek (tsheg), which is encoded at
> U+0F0B..."
> The description goes on to explain that the names are misleading, and
> how line-breaking processes should work. What it does *not* further
> clarify is where 0F0B would normally be entered in text. 
> The description of Tibetan in Daniels & Bright says (p. 435), "bar
> tsheg... serves to separate syllables..." That differs from the
> paragraph quoted above from TUS4 which suggests that it is used between
> words. Which is correct (or closer to correct)?
Hi Peter
In Tibetan there is no clear differentiation between "words" and 
"syllables" since every Tibetan syllable has lexical meaning (or
a grammatical function). There are of course multi-syllable words in 
Tibetan but these are like compound words in English where each 
component of the word has a meaning. So 0F0B is both a "word" and a 
"syllable" delimiter.
The confusion arises because the majority of works on Tibetan grammar in 
Western languages call what is between two tsheg characters (or a "tsheg 
bar") a "syllable" - but Tony Duff, one of the people responsible for 
writing the Tibetan block intro in TUS, was vehemently opposed to using 
that word - so the block intro ended up referring to "words" not 
"syllables".
The *primary* line break / word wrap opportunity in Tibetan occurs after 
    0F0B [except where immediately followed by space, 0F0D 0F0E, 0F0F, 
0F10 or 0F11 (in simple systems 0F0C can be used to inhibit line 
breaking or word wrap in these situations)]. Spaces occurring within 
blocks of Tibetan text should normally be treated as non breaking. There 
is also a line break opporunity *after* the sequence 0F0D <space> 0F0D 
or 0F0E <space> 0F0E.
Since 0F0B is the most frequent character in Tibetan text there are 
plenty of line break opportunities.
As 0F0B is both a word and syllable delimiter, a Tibetan spell checker 
would need to do some grammatical parsing to be useful.
The Punctuation marks 0F0D 0F0E, 0F0F, 0F10, 0F11 and 0F14  indicate the 
end of a phrase or a pause - they do not necessarily indicate the end of 
a sentance. Tibetan sentance boundaries need to be determined grammatically.
0F12 and 0F08 respectively indicate the start of a new section or a new 
topic. Some Tibetan texts have up to three (more or less elaborate) 
variants of each of these characters to indicate different levels of 
section or topic (similar to using different levels of numbering / 
indenting  in English documents [I., A., 1., a. etc.].
In Tibetan text paragraph breaks (hard CR) are often rare.  There may be 
many, many pages of text without a single paragraph break. The start of 
a new section  or chapter may only be indicated by 0F12 occurring in the 
middle of a paragraph.
  - Chris
This archive was generated by hypermail 2.1.5 : Fri Jan 14 2005 - 17:53:01 CST