Re: Version(s) of Unicode supported by various versions of Microsoft Windows

From: Antoine Leca (Antoine10646@leca-marti.org)
Date: Fri Mar 05 2004 - 12:52:52 EST

  • Next message: Kenneth Whistler: "RE: Version(s) of Unicode supported by various versions of Microsoft Windows"

    Hi Michael,

    Michael (michka) Kaplan va escriure:

    > For sortkey.nls -- that file does not ever change in size, as it is
    > not a file that one adds characters to.

    Well, I do not believe this is the most adequate place to discuss this, but
    here is my view about it.

    The sorting algorithm of NT, since NT 3.1 (and in fact NLS part of OLE 2
    too), use a big table of the weights attributed to each characters. This
    table is even partly user-visible via the LCMapString APIs (so you do not
    need to infringe any law about reverse engineering to understand all this
    stuff; nor I did: this is pure black-box observation; OTOH, using the
    content of this table would be a clear copyright infringement, so do not do
    that).

    Internally, in (Unicode-enable) NT, contrary to the variants used with
    Windows 3.1/9x and probably CE, the table is decomposed in two parts, one is
    the locale-dependent tailoring (in SORTTBLS.NLS), and a common part in
    SORTKEY.NLS. And, since NT 3.1, this SORTKEY.NLS file is 262144 bytes in
    size. One cannot miss that 262144 is 4*65536, and indeed the structure of
    this file confirms without doubt that each character is mapped, in Unicode
    order, to the 4 weights (and I personnaly did not miss it, because back in
    1994, a 256 Ki file was quite an bulky thing to deal with, particularly
    since I did only have 16-bit, DOS-based, tools).

    Now, the file in XP is still exactly 262144 bytes in size. To me, this is
    evidence that only the BMP characters did receive weights in this file.
    Since SORTTBLS.NLS is still a ridiculous 20 k in size, it does not hold the
    weights.

    So I deduce from it that Outer-Plane characters are probably non sorted in
    XP, or in other words that the Win32 API available with XP does not fully
    support Unicode 3.1 (furthermore, since Whistler was developped around year
    2000, and more earlier than later, while at the same time Unicode 3.1 was
    issued 2001-05-16, it would be very surprising if it support it).

    Now, what I do not know is:

     - if the Win32 NLS API has been fully upgraded to Unicode 3.0 for XP. I was
    thinking that when I did research it earlier today, since the sizes of the
    .NLS files did accordingly increase, but since I did not find the relevant
    KB article I was not sure. Michael's approximate answer (I beg your pardon
    if this was not the intent) that may lead to think it is an almost-full,
    almost-empty pot, is not a very good news

     - what is the status with NT 5.2 a.k.a. Server 2003, since I do not have
    access right now to this version. A quick look to the size of SORTKEY.NLS
    would give some hints: 256 Ki would say it is still at 3.0 level, 768 Ki
    (Plane 0, 1 and 2, perhaps with some adjust to cover the delightful plane
    14) would be an indication it supports meaningful surrogates without heavy
    changes to the scheme, 4352 Ki (4.25 Mi, 17 * 256Ki) would say the
    programmer did extend the table without even thinking about how to optimize
    it (I do not think it happens, but who knows), and some much smaller size
    would mean the algorithm was revised!

     - by the way, the same question can be asked with the beta releases of
    Longhorn. However, there is not much point trying to nail down the level of
    Unicode support of a beta.

    Antoine



    This archive was generated by hypermail 2.1.5 : Fri Mar 05 2004 - 13:30:39 EST