Re: Slots for Cyrillic Accented Vowels

From: Martin J. Dürst (duerst@it.aoyama.ac.jp)
Date: Mon May 23 2011 - 02:40:06 CDT

  • Next message: Christoph Päper: "Re: Slots for Cyrillic Accented Vowels"

    On 2011/05/19 19:35, Christoph Päper wrote:
    > I believe it would help if input immediately was transformed to and text was saved in NFD, because this would make the need for uniform treatment more obvious.

    It might help in theory, but in practice, NFC is much, much closer to
    what's out there in the real world (in particular the Web). So please
    use NFD for internal processing if you think that helps you, but please
    use NFC for all cases where it may be seen by other programs.

    > It would be cool if there was an ASCII-compatible encoding with variable length like UTF-8 that supported only NFD (or NFKD) and was optimized for a small storage footprint, e.g. from U+00C0–017F only a handful would have to be coded separately. Sadly, though, it is unrealistic to have a unique single byte code for each combining diacritic, because there are so many of them: even just ranges U+0300–036F and U+1DC0–1DFF are 176 positions together, although some are still unassigned; that is more than you can encode with 7 bits or less.

    We don't need any more character encodings. Unicode is about reducing
    them, not about inventing more. The storage savings are way less
    important with current hardware than the reduction of confusion with
    fewer encodings.

    Regards, Martin.



    This archive was generated by hypermail 2.1.5 : Mon May 23 2011 - 02:46:39 CDT