From: Christopher Fynn (cfynn@gmx.net)
Date: Fri Jan 14 2005 - 17:49:53 CST
Peter Constable wrote:
> The block description in TUS4 Chapter 9 for Tibetan (p255) says,
> " Tibetan script has two break characters only. The primary break
> character is the standard interword tsek (tsheg), which is encoded at
> U+0F0B..."
> The description goes on to explain that the names are misleading, and
> how line-breaking processes should work. What it does *not* further
> clarify is where 0F0B would normally be entered in text.
> The description of Tibetan in Daniels & Bright says (p. 435), "bar
> tsheg... serves to separate syllables..." That differs from the
> paragraph quoted above from TUS4 which suggests that it is used between
> words. Which is correct (or closer to correct)?
Hi Peter
In Tibetan there is no clear differentiation between "words" and
"syllables" since every Tibetan syllable has lexical meaning (or
a grammatical function). There are of course multi-syllable words in
Tibetan but these are like compound words in English where each
component of the word has a meaning. So 0F0B is both a "word" and a
"syllable" delimiter.
The confusion arises because the majority of works on Tibetan grammar in
Western languages call what is between two tsheg characters (or a "tsheg
bar") a "syllable" - but Tony Duff, one of the people responsible for
writing the Tibetan block intro in TUS, was vehemently opposed to using
that word - so the block intro ended up referring to "words" not
"syllables".
The *primary* line break / word wrap opportunity in Tibetan occurs after
0F0B [except where immediately followed by space, 0F0D 0F0E, 0F0F,
0F10 or 0F11 (in simple systems 0F0C can be used to inhibit line
breaking or word wrap in these situations)]. Spaces occurring within
blocks of Tibetan text should normally be treated as non breaking. There
is also a line break opporunity *after* the sequence 0F0D <space> 0F0D
or 0F0E <space> 0F0E.
Since 0F0B is the most frequent character in Tibetan text there are
plenty of line break opportunities.
As 0F0B is both a word and syllable delimiter, a Tibetan spell checker
would need to do some grammatical parsing to be useful.
The Punctuation marks 0F0D 0F0E, 0F0F, 0F10, 0F11 and 0F14 indicate the
end of a phrase or a pause - they do not necessarily indicate the end of
a sentance. Tibetan sentance boundaries need to be determined grammatically.
0F12 and 0F08 respectively indicate the start of a new section or a new
topic. Some Tibetan texts have up to three (more or less elaborate)
variants of each of these characters to indicate different levels of
section or topic (similar to using different levels of numbering /
indenting in English documents [I., A., 1., a. etc.].
In Tibetan text paragraph breaks (hard CR) are often rare. There may be
many, many pages of text without a single paragraph break. The start of
a new section or chapter may only be indicated by 0F12 occurring in the
middle of a paragraph.
- Chris
This archive was generated by hypermail 2.1.5 : Fri Jan 14 2005 - 17:53:01 CST