Re: NFC

From: Mark Davis (mark.davis@icu-project.org)
Date: Wed Feb 01 2006 - 10:11:57 CST

  • Next message: Tim Greenwood: "Re: NFC"

    No, that's not sufficient; there are some edge cases. In ICU we
    preprocess and store a number of pieces of data that are very useful in
    optimizing normalization, such as:
    a) those characters that can't combine or reorder with anything in front
    of them
    b) those characters that can't combine or reorder with anything behind them
    c) if a character were to be decomposed, what would the first ccc be,
    and what would the last
    and so on.

    If you run into a maybe character, then you can use the above
    information plus other UCD properties to find the minimal span that you
    need to worry about. (A completely stable character under NFC will be
    both (a) and (b), but you can do a somewhat better job if you have both
    pieces of information.)

    Mark

    Tim Greenwood wrote:

    >Annex 8 of UAX #15 (Normalization Forms) describes the quick lookup
    >property of Yes/No/Maybe for determining if a string is NFC. When I
    >get a 'Maybe' is it sufficient to do the fuller analysis from the
    >previous 'Yes' character? In other words (I think) is the previous
    >'yes' character a stable NFC code point? From the annex it seems to be
    >not, but I cannot think of an example.
    >
    >Can anyone provide an example where I would get a stream of 'Yes'
    >followed by a 'Maybe' where the fuller analysis needs to start before
    >the previous 'Yes'
    >
    >Thanks
    >Tim
    >
    >
    >
    >
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Wed Feb 01 2006 - 10:17:48 CST