Understanding normalisation

From: Theodore H. Smith (delete@elfdata.com)
Date: Sun May 28 2006 - 09:18:18 CDT

Next message: Doug Ewell: "Re: Unicode, SMS, PDA/cellphones"

Previous message: Theodore H. Smith: "Re: Unicode, SMS, PDA/cellphones"
Next in thread: Richard Wordingham: "Re: Understanding normalisation"
Reply: Richard Wordingham: "Re: Understanding normalisation"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

I've got some code that can do a multiple, parallel replacement upon
Unicode strings. I can use this successfully to decompose and compose
some Unicoode glyphs.

But that's all that it does, multiple parallel string replacement. No
reordering or anything else.

I'm wondering, what limitations would it have for being useful for
doing decomposition? And for doing composition?

Is it true that you can't successfully decompose a string without
doing a proper NFD operation on it? And just the same for composing,
is it true that you can't compose a string without doing a proper NFC?

For example: I seem to understand, that one problem that could occur
when doing a blind "composition" upon a unicode string, is that a
glyph may have it's combiners in a different order than my composer
recognises, and thus this character won't get composed.

Let's say I were to make a shell tool or something like that, that
performed my "multiple parallel string replacement" upon text files,
to do composition or decomposition. What limitations should I write
into the documentation for the tool, to say that given certain kinds
of text, it won't produce correct output. Basically, given what
limitations would this tool still produce correct output. Or would it
be better to make a simple additional processing step to make it
produce proper NFC or NFD output.

Is it true, that if I perform a proper combining character reordering
(As described by UTR15) upon some Unicode text, and then did my
"parallel string replacement based composer" upon the text, that I'd
generate correct NFC?

That question might give like I didn't understand normalisation. The
problem I'm having is that understanding Unicode.org's technical
information is a bit hard at times. I'm sure it can be explained in a
simpler manner?

Thanks for any answers!

PS: I've some users who have been using this composer/decomposer I've
made, for converting file names between Windows and OSX, and it
actually works perfectly for them.

But then I read the Unicode TR15 again, and realised that maybe it
was only a matter of time before a situation would come up where this
decomposer/composer failed due to not doing anything about reordering
combiners, but then I'm not sure if it will fail even, because I'm
having a hard time understanding this report.

Next message: Doug Ewell: "Re: Unicode, SMS, PDA/cellphones"
Previous message: Theodore H. Smith: "Re: Unicode, SMS, PDA/cellphones"
Next in thread: Richard Wordingham: "Re: Understanding normalisation"
Reply: Richard Wordingham: "Re: Understanding normalisation"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon May 29 2006 - 11:40:46 CDT