> From unicode@Unicode.ORG Wed Aug 28 09:52 PDT 1996
> At 16:30 1996-08-28, Martin J Duerst wrote:
> >Please be careful. To know whether an A is just only an A, you only have
> >to check the next position. If that next position is not a combining
> >character, you know it is an A, if it is a combining character, you
> >know it is "something else".
> Yes, but it's not a once-off look, is it? Because you can stack combining
> characters. So you know it's not an A, but you have to keep looking and
> looking and looking, don't you? Doesn't this make processing much more
> complex than Level 1 processing?
This discussion may be a little misleading regarding what the characters "are"
and what is required in processing.
U+0041 is ALWAYS an A, forever and forever.
The sequence U+0041 U+0301 is canonically equivalent to U+00C1 A (LATIN
CAPITAL LETTER A WITH ACUTE, if your mailer trashes that). A conformant
process "shall not assume that the interpretation of two canonical-
equivalent sequences are distinct." This means that I cannot claim
that I had U+0041 U+0301, but you interpreted it as U+00C1, and you're
wrong. It DOES NOT mean that all processing is much more complex. It
depends entirely on what processing is going on.
If I am doing string copies into buffers, there is no difference whatsoever.
If I am doing text matches for other than exact binary matches, then some
table lookup is involved, which may require lookahead even in Level 1
implementations. Whether this table lookup is "much more complex" using
combining characters depends on your implementation of the lookup.
If I am doing rendering, the degree of complexity of the character to
glyph mapping depends a lot on the font model, and could be made either
more complex or simpler when using combining marks.
And so on. Depending on the processing and the implementation of the
process, combining marks have differing effects. It is not a matter of
just always looking ahead making processing always more difficult.
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:31 EDT