Minor flaw in rules for locating text element boundaries

From: Timothy Partridge (timpart@perdix.demon.co.uk)
Date: Mon May 15 2000 - 15:08:02 EDT

Next message: Timothy Partridge: "Re: Lithuanian (was Re: Transliteration of Arabic characters into"
Previous message: Michael Everson: "RE: dozenal and hexadecimal digits"
Next in thread: Mark Davis: "Re: Minor flaw in rules for locating text element boundaries"
Maybe reply: Mark Davis: "Re: Minor flaw in rules for locating text element boundaries"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On page 125 of Unicode 3.0, rule 4 says

No overlapping sets. [snip] A later character set definition will override a
previous one, removing its characters from the previous set.

In the Line Boundaries section a large number of sets are defined on pages
129-130. Unfortunately the last set to be defined is

All All Unicode characters

Surely by strict interpretation of rule 4 this sucks all the characters out
of the previous sets? I know what you mean, but you don't mean what you say.

Tim

P.S. This significantly increases the efficiency of implementations - line
breaks can occur before and after every character :-)

-- 
 
Tim Partridge. Anyopinions expressed are mine only and not those of my employer

Next message: Timothy Partridge: "Re: Lithuanian (was Re: Transliteration of Arabic characters into"
Previous message: Michael Everson: "RE: dozenal and hexadecimal digits"
Next in thread: Mark Davis: "Re: Minor flaw in rules for locating text element boundaries"
Maybe reply: Mark Davis: "Re: Minor flaw in rules for locating text element boundaries"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:02 EDT