Re: Level of Unicode support required for various languages

From: Andrew West (andrewcwest@gmail.com)
Date: Wed Oct 31 2007 - 05:19:40 CST

Next message: vunzndi@vfemail.net: "Re: Encoding Personal Use Ideographs (was Re: Level of Unicode support required for various languages)"

Previous message: David Starner: "Re: Level of Unicode support required for various languages"
In reply to: Kenneth Whistler: "Re: Level of Unicode support required for various languages"
Next in thread: vunzndi@vfemail.net: "Re: Level of Unicode support required for various languages"
Reply: vunzndi@vfemail.net: "Re: Level of Unicode support required for various languages"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On 31/10/2007, Kenneth Whistler <kenw@sybase.com> wrote:
>
> O.k., challenge for the day:
>
> Which of the following IDS are encoded and which are not?
> Which are equal to which others?
> What do they mean?
>
> 2FF0 2FF3 4E36 6B79 706C 6534
> 2FF0 2FF3 4E36 6B79 706C 6535
> 2FF0 2FF3 4EA0 5915 706C 6534
> 2FF0 2FF3 4EA0 5915 706C 6535
> 2FF0 2FF1 2FF3 4E36 4E00 5915 706C 6534
> 2FF0 2FF1 2FF3 4E36 4E00 5915 706C 6535
> 2FF0 2FF1 4EA0 7CF9 6534
> 2FF0 2FF1 4EA0 7CF9 6535
> 2FF0 2FF1 2FF1 4E36 4E00 7CF9 6534
> 2FF0 2FF1 2FF1 4E36 4E00 7CF9 6535
> 2FF0 2FF1 4EA0 7CF8 6534
> 2FF0 2FF1 4EA0 7CF8 6535
> 2FF0 2FF1 2FF1 4E36 4E00 7CF8 6534
> 2FF0 2FF1 2FF1 4E36 4E00 7CF8 6535
> 2FF0 2FF3 4E36 4E00 7CF9 6534
> 2FF0 2FF3 4E36 4E00 7CF9 6535
> 2FF0 2FF3 4E36 4E00 7CF8 6534
> 2FF0 2FF3 4E36 4E00 7CF8 6535

According to Vunzndi's excellent IDS lookup tool
<http://www.l10n-support.com/cgi-bin/search.cgi?> only

2FF0 2FF1 4EA0 7CF8 6535 = U-22F7A

But clearly a number of the other IDS sequences you give are equivalent to this.

The glyph components <4E36 6B79 706C>, <4EA0 5915 706C> and <4E36 4E00
5915 706C> are not equivalent to the <4EA0 7CF8> and so none of the
IDS sequences with these glyph component sequences should be
considered alternate representations of U-22F7A.

U+6534 and U+6535 are non-unifiable components, so IDS sequences with
6534 should represent a different character than those sequences with
6535.

On the other hand, U+7CF8 amd U+7CF9 are unifiable glyph variants, and
therefore which one is used in the IDS sequence is not significant for
character matching purposes.

And the sequence <2FF1 4E36 4E00> is a decomposition [s.l.] of 4EA0,
and so IDS sequences with either <2FF1 4E36 4E00> or 4EA0 are
equivalent.

Therefore, in my opinion the following are alternate representations
of U-22F7A, and the other sequences you give are not correct
representations of U-22F7A (I don't think they represent encoded
characters, but I may be wrong):

2FF0 2FF1 4EA0 7CF9 6535
2FF0 2FF1 2FF1 4E36 4E00 7CF9 6535
2FF0 2FF1 2FF1 4E36 4E00 7CF8 6535
2FF0 2FF3 4E36 4E00 7CF9 6535
2FF0 2FF3 4E36 4E00 7CF8 6535

I'm not quite sure what the point of the exercise is. We all know that
that there may be multiple ways of representing the same character
using IDS sequences, but any process that is designed to work with IDS
sequences should normalize [s.l.] sequences so that alternate
representations are treated as identical, e.g. in this example
normalize 7CF9 to 7CF8 (unifiable glyph variants), and normalize <4E36
4E00> to 4EA0 (normalize to the shortest possible sequence).

Andrew

Next message: vunzndi@vfemail.net: "Re: Encoding Personal Use Ideographs (was Re: Level of Unicode support required for various languages)"
Previous message: David Starner: "Re: Level of Unicode support required for various languages"
In reply to: Kenneth Whistler: "Re: Level of Unicode support required for various languages"
Next in thread: vunzndi@vfemail.net: "Re: Level of Unicode support required for various languages"
Reply: vunzndi@vfemail.net: "Re: Level of Unicode support required for various languages"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Oct 31 2007 - 05:32:48 CST