From: Theodore H. Smith (delete@elfdata.com)
Date: Sun May 28 2006 - 09:18:18 CDT
I've got some code that can do a multiple, parallel replacement upon  
Unicode strings. I can use this successfully to decompose and compose  
some Unicoode glyphs.
But that's all that it does, multiple parallel string replacement. No  
reordering or anything else.
I'm wondering, what limitations would it have for being useful for  
doing decomposition? And for doing composition?
Is it true that you can't successfully decompose a string without  
doing a proper NFD operation on it? And just the same for composing,  
is it true that you can't compose a string without doing a proper NFC?
For example: I seem to understand, that one problem that could occur  
when doing a blind "composition" upon a unicode string, is that a  
glyph may have it's combiners in a different order than my composer  
recognises, and thus this character won't get composed.
Let's say I were to make a shell tool or something like that, that  
performed my "multiple parallel string replacement" upon text files,  
to do composition or decomposition. What limitations should I write  
into the documentation for the tool, to say that given certain kinds  
of text, it won't produce correct output. Basically, given what  
limitations would this tool still produce correct output. Or would it  
be better to make a simple additional processing step to make it  
produce proper NFC or NFD output.
Is it true, that if I perform a proper combining character reordering  
(As described by UTR15) upon some Unicode text, and then did my  
"parallel string replacement based composer" upon the text, that I'd  
generate correct NFC?
That question might give like I didn't understand normalisation. The  
problem I'm having is that understanding Unicode.org's technical  
information is a bit hard at times. I'm sure it can be explained in a  
simpler manner?
Thanks for any answers!
PS: I've some users who have been using this composer/decomposer I've  
made, for converting file names between Windows and OSX, and it  
actually works perfectly for them.
But then I read the Unicode TR15 again, and realised that maybe it  
was only a matter of time before a situation would come up where this  
decomposer/composer failed due to not doing anything about reordering  
combiners, but then I'm not sure if it will fail even, because I'm  
having a hard time understanding this report.
This archive was generated by hypermail 2.1.5 : Mon May 29 2006 - 11:40:46 CDT