Re: Tibetan block description and "tsek"

From: Christopher Fynn (
Date: Fri Jan 14 2005 - 17:49:53 CST

  • Next message: Kenneth Whistler: "Re: Four Punctuation Symbols"

    Peter Constable wrote:

    > The block description in TUS4 Chapter 9 for Tibetan (p255) says,

    > " Tibetan script has two break characters only. The primary break
    > character is the standard interword tsek (tsheg), which is encoded at
    > U+0F0B..."

    > The description goes on to explain that the names are misleading, and
    > how line-breaking processes should work. What it does *not* further
    > clarify is where 0F0B would normally be entered in text.

    > The description of Tibetan in Daniels & Bright says (p. 435), "bar
    > tsheg... serves to separate syllables..." That differs from the
    > paragraph quoted above from TUS4 which suggests that it is used between
    > words. Which is correct (or closer to correct)?

    Hi Peter

    In Tibetan there is no clear differentiation between "words" and
    "syllables" since every Tibetan syllable has lexical meaning (or
    a grammatical function). There are of course multi-syllable words in
    Tibetan but these are like compound words in English where each
    component of the word has a meaning. So 0F0B is both a "word" and a
    "syllable" delimiter.

    The confusion arises because the majority of works on Tibetan grammar in
    Western languages call what is between two tsheg characters (or a "tsheg
    bar") a "syllable" - but Tony Duff, one of the people responsible for
    writing the Tibetan block intro in TUS, was vehemently opposed to using
    that word - so the block intro ended up referring to "words" not

    The *primary* line break / word wrap opportunity in Tibetan occurs after
        0F0B [except where immediately followed by space, 0F0D 0F0E, 0F0F,
    0F10 or 0F11 (in simple systems 0F0C can be used to inhibit line
    breaking or word wrap in these situations)]. Spaces occurring within
    blocks of Tibetan text should normally be treated as non breaking. There
    is also a line break opporunity *after* the sequence 0F0D <space> 0F0D
    or 0F0E <space> 0F0E.

    Since 0F0B is the most frequent character in Tibetan text there are
    plenty of line break opportunities.

    As 0F0B is both a word and syllable delimiter, a Tibetan spell checker
    would need to do some grammatical parsing to be useful.

    The Punctuation marks 0F0D 0F0E, 0F0F, 0F10, 0F11 and 0F14 indicate the
    end of a phrase or a pause - they do not necessarily indicate the end of
    a sentance. Tibetan sentance boundaries need to be determined grammatically.

    0F12 and 0F08 respectively indicate the start of a new section or a new
    topic. Some Tibetan texts have up to three (more or less elaborate)
    variants of each of these characters to indicate different levels of
    section or topic (similar to using different levels of numbering /
    indenting in English documents [I., A., 1., a. etc.].

    In Tibetan text paragraph breaks (hard CR) are often rare. There may be
    many, many pages of text without a single paragraph break. The start of
    a new section or chapter may only be indicated by 0F12 occurring in the
    middle of a paragraph.

      - Chris

    This archive was generated by hypermail 2.1.5 : Fri Jan 14 2005 - 17:53:01 CST