Re: NNBSP and Word Boundaries

From: Richard Wordingham <richard.wordingham_at_ntlworld.com>
Date: Sun, 4 Oct 2015 23:54:30 +0100

On Fri, 2 Oct 2015 09:25:01 +0200
Mark Davis ☕️ <mark_at_macchiato.com> wrote:

> We add:
>
> WB13c Mongolian_Letter × NNBSP
> WB13d NNBSP × Mongolian_Letter
>
> *If* we want to also change behavior on the other side of the NNBSP,
> whenever the Mongolian_Letter and NNBSP occur in sequence, we add 2
> additional rules (with the appropriate values for ..., like Numeric)
>
> WB13c Mongolian_Letter NNBSP (...)
> WB13d (...) × NNBSP Mongolian_Letter

I'll assume the last two are meant to be WB13e and WB13f.

We can achieve the effects down to the first WB13d simply by changing
NNBSP from XX to MidNumLet. This would also provide a proper "espace
fine" for French use within numbers
( https://www.druide.com/enquetes/pour-des-espaces-ins%C3%A9cables-impeccables
) to separate groups of 3 digits. This needs *no* extra rules.

Now for combined numbers and letters, we might consider adding the two
rules:

WB12a Numeric MidNumLet × AHLetter
WB12b Numeric × MidNumLet AHLetter

I think we should go the whole hog, and instead have

WB12c (Numeric|AHLetter) MidNumLetQ × (Numeric|AHLetter)
WB12d (Numeric|AHLetter) × MidNumLetQ (Numeric|AHLetter)

Perhaps there are good reasons against them - I'm not aware of any. (I
don't think it is wrong to treat "no.2" as a single word.) These rules
would make the abbreviated names of a good many Thai forms (e.g. คร.๒, a
marriage certificate) into a single word.

WB12c and WB12d overlap with WB6, WB7, WB11 and WB12, which could be
slightly simplified.

Richard.
Received on Sun Oct 04 2015 - 17:55:58 CDT

This archive was generated by hypermail 2.2.0 : Sun Oct 04 2015 - 17:55:58 CDT