From: Richard Wordingham <richard.wordingham_at_ntlworld.com>

Date: Thu, 15 Dec 2016 20:29:26 +0000

Date: Thu, 15 Dec 2016 20:29:26 +0000

On Wed, 14 Dec 2016 18:44:39 +0100

Reini Urban <reini_at_cpanel.net> wrote:

*> On Dec 5, 2016, at 3:31 PM, Richard Wordingham
*

*> <richard.wordingham_at_ntlworld.com> wrote:
*

*> > The choice with PHI includes:
*

*> >
*

*> > U+0278 LATIN SMALL LETTER PHI
*

*> > U+03C6 GREEK SMALL LETTER PHI
*

*> >
*

*> > a Greek (!) script character with compatibiity decomposition to
*

*> > U+03C6
*

*> >
*

*> > U+03D5 GREEK PHI SYMBOL
*

*> >
*

*> > and a whole host of common script characters with compatibility
*

*> > decomposition to U+03C6:
*

*> >
*

*> > U+1D6D7 MATHEMATICAL BOLD SMALL PHI
*

*> > U+1D6DF MATHEMATICAL BOLD PHI SYMBOL
*

*> > U+1D711 MATHEMATICAL ITALIC SMALL PHI
*

*> > U+1D719 MATHEMATICAL ITALIC PHI SYMBOL
*

*> > U+1D74B MATHEMATICAL BOLD ITALIC SMALL PHI
*

*> > U+1D753 MATHEMATICAL BOLD ITALIC PHI SYMBOL
*

*> > U+1D785 MATHEMATICAL SANS-SERIF BOLD SMALL PHI
*

*> > U+1D78D MATHEMATICAL SANS-SERIF BOLD PHI SYMBOL
*

*> > U+1D7BF MATHEMATICAL SANS-SERIF BOLD ITALIC SMALL PHI
*

*> > U+1D7C7 MATHEMATICAL SANS-SERIF BOLD ITALIC PHI SYMBOL
*

*> >
*

*> > They are all ID_Start.
*

*>
*

*> Oh my. Dragons beware. So I need to add some trie tables to add
*

*> warnings with those rules also. I don’t want to error on some obscure
*

*> confusables rule only yet. perl doesn’t even ship the security
*

*> tables, so people are not aware of it.
*

Another solution would be to treat two identifiers as the same if they

have the same NFKC normalisation.

*> > You didn't mention the inherited script. Is that automatically
*

*> > allowed, e.g. φ̈ᵣ <U+03C6, U+0308 COMBINING DIAERESIS, U+1D63 LATIN
*

*> > SUBSCRIPT SMALL LETTER R> (scripts: Greek, inherited, Latin)? I
*

*> > encountered that variable name in a radar specification last week.
*

*>
*

*> Inherited is allowed with ID_Continue, yes. Not in ID_Start position.
*

*> Combiners are normalized to NFC.
*

<U+03C6, U+0308, U+1D63> is unchanged under normalisation to NFC, NFD,

NFKC and NFKD.

*> > There might be issues - it's possible that क̐ <U+0915 DEVANAGARI
*

*> > LETTER KA, U+0310 COMBINING CANDRABINDU> might spoof कँ <U+0915,
*

*> > U+0901 DEVANAGARI SIGN CANDRABINDU>.
*

*> \x{915}\x{310} is legal Devanagari normalized to one char.
*

I don't know know what you mean by this statement. <U+0915, U+0310> is

also unchanged under the 4 normalisations.

*> \x{915}\x{901} are two legal Devanagari characters.
*

*> but they are confusables. This would need special confusable rules.
*

Additionally, U+0310 can be confused quite readily with the sequence

<U+0306 COMBINING BREVE, U+0307 COMBINING DOT ABOVE>.

Richard.

Received on Thu Dec 15 2016 - 14:30:01 CST

*
This archive was generated by hypermail 2.2.0
: Thu Dec 15 2016 - 14:30:02 CST
*