Re: Unicode and RFC 4690

From: Kenneth Whistler (kenw@sybase.com)
Date: Wed Oct 04 2006 - 12:49:44 CST

Next message: Paul Hastings: "Re: "Visually approximate" conversion from unicode to Windows-1251 (or similar code page)"

Previous message: Addison Phillips: "Re: "Visually approximate" conversion from unicode to Windows-1251 (or similar code page)"
Maybe in reply to: Jefsey_Morfin: "Unicode and RFC 4690"
Next in thread: Steve Summit: "Re: Unicode and RFC 4690"
Reply: Steve Summit: "Re: Unicode and RFC 4690"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Steve Summit wrote:

> I think what Jefsey was asking about was whether anyone has
> done any real work on what we might call the "next phase" of
> normalization, namely that which considers all pairs and sets
> of likely visually-similar glyphs (what RFC 4690 calls
> "confusables"), across all languages and scripts.

This is not actually the "next phase" of normalization, but
a rather different problem.

Normalization converts Unicode text into a known form that can
be compared reliably for equality under the terms of that
normalization.

The issue of visual confusability in host names (and IRIs in
general) relates even to such ASCII-derived confusable pairs
as O/0 and I/l/1, which no normalization algorithm is going
to equate without destruction of the interpretation of the text.

>
> To cite the simplest example: everyone knows that U+0041 Latin
> Capital Letter A, U+0391 Greek Capital Letter Alpha, and U+0410
> Cyrillic Capital Letter A are likely to be visibly very similar,
> if not identical. But, as far as I know, the existing
> normalization algorithms don't touch them.

Nor should they.

> Are there any
> established, respected, comprehensive repositories of such
> equivalences?

http://www.unicode.org/reports/tr39/

--Ken

Next message: Paul Hastings: "Re: "Visually approximate" conversion from unicode to Windows-1251 (or similar code page)"
Previous message: Addison Phillips: "Re: "Visually approximate" conversion from unicode to Windows-1251 (or similar code page)"
Maybe in reply to: Jefsey_Morfin: "Unicode and RFC 4690"
Next in thread: Steve Summit: "Re: Unicode and RFC 4690"
Reply: Steve Summit: "Re: Unicode and RFC 4690"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Oct 04 2006 - 12:51:52 CST