L2/05-065

Date: Tue,  8 Feb 2005 13:39:30 -0500
From: Mark Davis
Subject: UTR #36 issues

My notes from the last meeting for the document include the following:

Add description of in-script cases like U+0110 (?) Latin Capital Letter D With
Stroke versus U+00D0 (Ω Latin Capital Letter Eth, also cases of visually
confusable punctuation

Fix 2a/b to be parallel to other cases

Describe problems with number parsing, including cases like U+0B68 (?) Oriya
Digit Two, which looks like a 9. Software commonly looks at just the numeric
value of a sequence of characters will interpret the numeric value differently
than what the user expects.

Give more background as to why normalization fixes certain problems, and which
it does not fix. Describe how implementations of normalization can use small
data set limited to only supported characters. Describe the recommended use of
normalization in non-domain part of URL.
Describe how reverse-bidi (visual order -> storage order) can be used to detect
bidi spoofs. That is: one can apply bidi then reverse bidi: if the result does
not match the original, then reject the string.

Explain that private use characters can cause security problems, and recommend
against their use.

Fonts: should follow the Unicode recommendations for missing glyphs, making
visible distinctions among them. Descript best practices for invisible glyphs.
Describe cases in complex languages (eg Indic) where the same visual appearance
may result from two different undering character sequences -- in the right
context.

Add more description on the recommended use of tool-tips and other mechanisms
for alerting users.

If people have other items, I'd appreciate feedback (and text for inclusion!).