Re: Specification of Encoding of Plain Text

From: Richard Wordingham <richard.wordingham_at_ntlworld.com>
Date: Thu, 12 Jan 2017 21:26:02 +0000

On Thu, 12 Jan 2017 21:03:29 +0100
Mark Davis ☕️ <mark_at_macchiato.com> wrote:

> That was just an example off the top of my head of the format for
> using with regex; I don't pretend that it is vetted. Latin is not a
> complex script, so it was only an illustration. However, it was just
> brain freeze on my part to not also include Inherited or ZWJ. A more
> serious effort would look at some of the issues from
> http://unicode.org/reports/tr29/, for example. On the other hand, CGJ
> is not a problem: it is Mn
> <http://unicode.org/cldr/utility/character.jsp?a=034F>. And (say)
> U+064B ARABIC FATHATAN has scx=Arabic,Syriac, so wouldn't be included.

Ah, I had not appreciated that sc=Inherited does not imply
scx=Inherited. Using Script_Extensions to document the international
combining characters that are used, for example, with Thai bases could
have all sorts of undesirable knock-on effects.

Richard.
Received on Thu Jan 12 2017 - 15:26:28 CST

This archive was generated by hypermail 2.2.0 : Thu Jan 12 2017 - 15:26:28 CST