Overload (was Re: Text Editors and Canonical Equivalence (was Coloured diacritics))

From: Mark Davis (mark.davis@jtcsv.com)
Date: Tue Dec 09 2003 - 13:01:46 EST

  • Next message: jcowan@reutershealth.com: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"

    > No, surely not. If the wcslen() function is fully Unicode conformant, it
    > should give the same output whatever the canonically equivalent form of
    > its input. That more or less implies that it should normalise its input.

    No, that is not a requirement of Unicode conformance.

    BTW, I must confess to an inability to keep up with the level of mail on this
    list. There are so many things in these mails that are simply wrong, and
    insufficient time for knowledgeable people to correct them. I would just caution
    people to first consult the materials on the Unicode site (Standard, TRs, FAQs,
    etc.), and take much of what is on this list with a quite sizable grain of salt.

    Mark
    __________________________________
    http://www.macchiato.com
    ► शिष्यादिच्छेत्पराजयम् ◄

    ----- Original Message -----
    From: "Peter Kirk" <peterkirk@qaya.org>
    To: "Arcane Jill" <arcanejill@ramonsky.com>
    Cc: <unicode@unicode.org>
    Sent: Tue, 2003 Dec 09 09:12
    Subject: Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

    > On 09/12/2003 07:00, Arcane Jill wrote:
    >
    > >
    > > Hmm. Now here's some C++ source code (syntax colored as Philippe
    > > suggests, to imply that the text editor understands C++ at least well
    > > :enough to color it)
    > >
    > > int n = wcslen(L"café");
    > >
    > > (That's int n = wcslen(L"café"); for those without HTML email)
    > >
    > > The L prefix on a string literal makes it a wide-character string, and
    > > wcslen() is simply a wide-character version of strlen(). (There is no
    > > guarantee that "wide character" means "Unicode character", but let's
    > > just assume that it does, for the moment).
    > >
    > > So, should n equal four or five? The answer would appear to depend on
    > > whether or not the source file was saved in NFC or NFD format.
    > >
    > No, surely not. If the wcslen() function is fully Unicode conformant, it
    > should give the same output whatever the canonically equivalent form of
    > its input. That more or less implies that it should normalise its input.
    > (One can imagine a second parameter specifying whether NFC or NFD is
    > required.) This makes the issue one not for the text editor but for the
    > programming language or its string handling library.
    >
    > > There is more to consider than just how and whether a text editor
    > > normalizes. If a text editor is capable of dealing with Unicode text,
    > > perhaps it should also be able to explicitly DISPLAY the actual
    > > composition form of every glyph. The question I posed in the previous
    > > paragraph should ideally be obvious by sight - if you see four
    > > characters, there are four characters; if you see five characters,
    > > there are five characters. This implies that such a text editor should
    > > display NFD text as separate glyphs for each character.
    > >
    > > On the other hand, such a text editor must also acknowledge that "é"
    > > and "e + U+0301" are actually equivalent. The /intention/ of canonical
    > > equivalence is that the glyphs should display the same - otherwise
    > > we'd need precomposed versions of, well, everything. So in other
    > > contexts, is should display them the same.
    > >
    > The Unicode standard does allow for special display modes in which the
    > exact underlying string, including control characters, is made visible.
    >
    > > Yuk. That's a lot to think about for anyone considering writing a
    > > programmers' text editor with /serious/ Unicode support.
    > > Jill
    > >
    > >
    > Simply allow the text editor to save as either NFC or NFD, and let the
    > programming language sort out the rest.
    >
    > --
    > Peter Kirk
    > peter@qaya.org (personal)
    > peterkirk@qaya.org (work)
    > http://www.qaya.org/
    >
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Tue Dec 09 2003 - 13:50:29 EST