RE: MS Windows and Unicode 4.0 ?

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon Dec 01 2003 - 18:03:19 EST

  • Next message: Michael Everson: "RE: Oriya: mba / mwa ?"

    Carl W. Brown wrote:
    > Doug writes:
    > > You might remember that I chided Microsoft for
    > > its definition of "Unicode" in
    > > Windows 2000 Help, where Unicode was described
    > > as a "16-bit standard" that was "developed between
    > > 1988 and 1991," implying that the work was
    > > finished. Even at the time Windows 2000 was being
    > > developed, there was quite a bit of room for
    > > improvement in this definition.
    >
    > You are right however, Unicode was officially still 16 bit when
    > Win2000 was released to manufacturing. We though they knew about
    > surrogates and new planes, it was not official and could have
    > been changed.

    Oh God... Surrogates were standardized long before they started
    being used in Unicode 3.2 for new codepoint assignments out of
    the BMP...

    And Microsoft was already a full member of the UTC, and knew all
    about the required support for GB18030 in P.R.China starting in
    2000.

    Unicode 3.0.0 was released in September 1999
    and was superseding Unicode 2.1.9 published in April 1999
    (UTR #8 version 3.0, see
    http://www.unicode.org/unicode/reports/tr8/).

    Note also that normalization was already published at that time
    (see version 17.0 of UTR#15 in September 1999 at
    http://www.unicode.org/unicode/reports/tr15/tr15-17.html)

    As well as the encoding model for surrogates
    (see http://www.unicode.org/reports/tr17/tr17-2.html
    dated 1998-10-14, which clearly states that the
    range of codepoints in 0..10FFFF and already references
    UTF-8 and UTF-16 as valid encoding forms for this range,
    with up to 4 bytes in UTF-8, or 2 words in UTF-16).

    The character model was already known as well as the general
    structure of Unicode to handle characters out of the BMP.
    These new characters were not standardized magically from
    nothing: the Han working group was actively working and the
    GB18030 standard was already there, that clearly demonstrated
    that mapping the required GB18030 repertoire in Unicode
    would be unavoidable. So there were already very active
    discussions between Unicode, ISO/IEC 10646, and Han working
    group to integrate GB18030 within Unicode. It was clear that
    many new characters would become necessary in Unicode 3.0.0
    even if only Unicode 2.1.9 was published at that time.

    Microsoft must have then anticipated this by working actively
    to experiment the proposed models. Adding immediately the correct
    support of surrogates was then a high priority, even if a
    complete charset mapping to Unicode was not available at
    that time to translate between GB18030 and Unicode.

    So Windows 2000 should have had a full support of surrogates
    immediately (and correctly handle unmatched surrogate pairs
    as invalid sequences for use in filenames, as well as in its
    international support libraries, simply because it was needed
    for GB18030 support)...

    __________________________________________________________________
    << ella for Spam Control >> has removed Spam messages and set aside
    Newsletters for me
    You can use it too - and it's FREE! http://www.ellaforspam.com





    This archive was generated by hypermail 2.1.5 : Mon Dec 01 2003 - 18:47:03 EST