Re: Rendering Raised FULL STOP between Digits

From: Richard Wordingham <richard.wordingham_at_ntlworld.com>
Date: Sun, 10 Mar 2013 02:52:25 +0000

On Sat, 09 Mar 2013 16:21:17 -0700
Karl Williamson <public_at_khwilliamson.com> wrote:

> Rendering is not the only consideration. Processing textual content
> for 0387 is broken because it is considered to be an ID_Continue
> character, whereas its Greek usage is equivalent to the English
> semicolon, something that would never occur in the middle of a word
> nor an identifier.

ID_Continue is for processing things like variable names. How does
allowing U+0387 in variable names cause problems in the processing of
text?

How would ID_continue allow you to process English «foc’s’le» or
«co-operate»? The default word boundary determination has been
tailored to give you the right results,and should work for Greek unless
you are working with scripta continua, in which case you have massive
problems regardless.

Note also that word boundary determination is intended to be
tailorable, which would allow one to exclude U+00B7 and U+0387 from
words or deal with miscoded accents and breathings physically at the
start of a word beginning with a capitalised vowel. One should also be
able to tailor it to deal with word final apostrophes - though doing
that in the CLDR style could be computationally excessive if the text
may contain quoting apostrophes. One might even tailor it to allow
Greek «ὅ,τι», depending on whether one wishes to count it as a word.

Richard.
Received on Sat Mar 09 2013 - 20:54:19 CST

This archive was generated by hypermail 2.2.0 : Sat Mar 09 2013 - 20:54:19 CST