RE: Implementing NFC

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sat Mar 17 2007 - 13:40:03 CST

  • Next message: Doug Ewell: "Re: Implementing NFC"

    I think that normalizing to NFC is good when it's time to store the text in file (this could be made optional by adding an option in the save dialog) or into some data stream (for later interchange) ; in the editor itself (or even after saving the file from the editor storage), you don't need to perform it, so you'll keep the characters as they are generated from the keyboard driver. This way, you don't need any special code for specific keyboard drivers.

    Rally, normalization is only needed for compatibility with other processes that do not recognize the canonically equivalent forms (i.e. non Unicode-compliant processes, because all compliant processes should produce consistent results, i.e. canonically equivalent results from any canonically equivalent input), or that restrict their supported character set (for example, for increased security like in IDN).

    > -----Message d'origine-----
    > De la part de Eric Muller
    > Envoyé : samedi 17 mars 2007 17:20
    > À : Daniel Ehrenberg
    > Cc : unicode@unicode.org
    > Objet : Re: Implementing NFC
    >
    > Daniel Ehrenberg wrote:
    > > I'm just wondering, are there any other programming languages that
    > > handle Unicode by storing strings in a consistently normalized form?
    > I don't know of any, but you should realize that this comes at a
    > functional cost.
    >
    > Consider writing a text editor and consider the Windows Vietnamese
    > keyboard. Because of the layout of this keyboard, data entered with it
    > is not in a normalized form; for example, ễ is entered by hitting two
    > keystrokes, the first generating U+00EA ê LATIN SMALL LETTER E WITH
    > CIRCUMFLEX, the second generating U+0303 ◌̃ COMBINING TILDE. Your
    > approach means that the stored text is either <U+1EC5 ễ LATIN SMALL
    > LETTER E WITH CIRCUMFLEX AND TILDE> (if you choose NFC) or <U+0065 e
    > LATIN SMALL LETTER E, U+0302 ◌̂ COMBINING CIRCUMFLEX ACCENT, U+0303 ◌̃
    > COMBINING TILDE> (if you choose NFD). In either case, the number of
    > characters see by the editor and the number of keystrokes do not match.
    > If you want to build your editor so that <any key, delete> is a no-op,
    > then you need to compensate for this mismatch, and in fact you need to
    > have a detailed knowledge of the keyboard in your editor. This sound a
    > bit much to me.



    This archive was generated by hypermail 2.1.5 : Sat Mar 17 2007 - 13:42:19 CST