RE: extracting words

From: Christopher John Fynn (
Date: Wed Feb 14 2001 - 05:02:38 EST


> -----Original Message-----
> From: []
> Sent: Tuesday, February 13, 2001 10:35 AM
> To:;
> Subject: RE: extracting words
> > BTW In traditional Tibetan orthography, a space is *not* a line break
> opportunity.
> What's the role of a space in there, then?

They occur after any character marking the end of a phrase - but this is not a line break opportunity. You can break/wrap a line following any syllable/morpheme boundary [OF0B] in the middle of a phrase but not in a space between two phrases!

In Tibetan spaces may occur after characters 0F0D, 0F0E, OF0F, 0F10, 0F11 & 0F14 - which are all used to mark the end of a phrase (each of these characters is used in slightly different situations). If you get the sequence 0F0D <SPACE(s)> 0F0D - there is a line break opportunity after the second 0F0D (before the first letter in the next phrase). If you get any one of the characters listed above followed by a space or spaces and then followed by a new phrase (without another 0F0D or 0F0E) the first break opportunity is following the first syllable (i.e. after the first 0F0B) in the next phrase.

In other words in Tibetan script text spaces should (nearly) always be non-breaking spaces. The only exceptions are in modern "western format" publications where you sometimes get things like lists formatted as in European books, and newspapers with narrow columns of text (and of course in documents carelessly produced in text-processing applications that wrap on spaces where the user hasn't bothered to fix the result by replacing these with non-breaking spaces.)
The character OF0B always provides a break opportunity (following the character) - except where it occurs immediately before 0F0D or 0F0E (in which case OF0B should really be replaced by the non-breaking 0F0C). So for line breaking/wrapping purposes 0F0B more or less fulfils the function of an inter word space.

- Chris

> > - Chris
> >
> > --
> > Chris Fynn
> > DDC Dzongkha Computing Project
> > Thimphu, Bhutan.

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:18 EDT