Re: Mark-up to Indicate Words

From: Martin J. Dürst <duerst_at_it.aoyama.ac.jp>
Date: Wed, 15 Jul 2015 20:18:09 +0900

Hello Richard,

On 2015/07/15 16:49, Richard Wordingham wrote:
> What mark-up schemes exist to show that a sequence of letters and
> combining marks constitutes a single word?
>
> Such mark-up would be useful when using spell checkers. At present, I
> use U+2060 WORD JOINER (WJ) to indicate the absence of a word boundary.
> (Systematic marking of boundaries using ZWSP is not popular with
> users, and is normally not used in Thai - it's not supported in
> their national or Windows 8-bit encodings.) However, it seems likely
> that when Unicode 8.00 is defined in August, WJ will suppress line
> breaks but not word breaks. There would still be the limitation that
> mark-up is not available in plain text.
>
> It appears that, for example, Open Document Format has no mark-up to
> indicate word boundaries, relying instead on the overrides of
> the word boundary detection algorithms being stored at character level.

I'd suggest looking at higher-end formats such as DITA or TEI (Text
Encoding Initiative).

Regards, Martin.

> Richard.
> .
>
Received on Wed Jul 15 2015 - 06:19:19 CDT

This archive was generated by hypermail 2.2.0 : Wed Jul 15 2015 - 06:19:20 CDT