Date: Wed Feb 16 2005 - 11:39:19 CST

    At 09:20 -0800 2005/02/14, Mark Davis wrote:
    > 3. The UTR had for some time recommended the development of data on visually
    > confusables, and we will be starting to collect data to test the feasibility
    > of different approaches.

    One way to handle confusables might be, as opposed of attempting to prohibit
    characters, to declare certain groups of characters (or character sequences)
    equivalent. Only one name in each equivalence class will accepted. Then, if
    somebody tries to define a look-alike name, it will be viewed as already

    One way to define such equivalences might be to look for another character
    set C (which might be the full, or a subset of the Unicode set), and then
    map the Unicode characters into the set of finite sequences from C. For
    example, homographs will be mapped to the same character. In order to check
    whether two Unicode sequence are equivalent, one only has to compute their
    mapped C-sequences and see if these are equal. This C-sequence will normally
    not be the preferred written one, even if C is the Unicode set: For example,
    if both Latin and Greek upper/lower case letters are defined as equivalent,
    and further, both Latin/Greek "A" are viewed as homographs, then also lower
    case "a" and "alpha" be equivalent, but one will still be able to use the
    one form over the other in specific cases.

      Hans Aberg

