From: Antoine Leca (Antoine10646@leca-marti.org)
Date: Fri Mar 05 2004 - 12:52:52 EST
Michael (michka) Kaplan va escriure:
> For sortkey.nls -- that file does not ever change in size, as it is
> not a file that one adds characters to.
Well, I do not believe this is the most adequate place to discuss this, but
here is my view about it.
The sorting algorithm of NT, since NT 3.1 (and in fact NLS part of OLE 2
too), use a big table of the weights attributed to each characters. This
table is even partly user-visible via the LCMapString APIs (so you do not
need to infringe any law about reverse engineering to understand all this
stuff; nor I did: this is pure black-box observation; OTOH, using the
content of this table would be a clear copyright infringement, so do not do
Internally, in (Unicode-enable) NT, contrary to the variants used with
Windows 3.1/9x and probably CE, the table is decomposed in two parts, one is
the locale-dependent tailoring (in SORTTBLS.NLS), and a common part in
SORTKEY.NLS. And, since NT 3.1, this SORTKEY.NLS file is 262144 bytes in
size. One cannot miss that 262144 is 4*65536, and indeed the structure of
this file confirms without doubt that each character is mapped, in Unicode
order, to the 4 weights (and I personnaly did not miss it, because back in
1994, a 256 Ki file was quite an bulky thing to deal with, particularly
since I did only have 16-bit, DOS-based, tools).
Now, the file in XP is still exactly 262144 bytes in size. To me, this is
evidence that only the BMP characters did receive weights in this file.
Since SORTTBLS.NLS is still a ridiculous 20 k in size, it does not hold the
So I deduce from it that Outer-Plane characters are probably non sorted in
XP, or in other words that the Win32 API available with XP does not fully
support Unicode 3.1 (furthermore, since Whistler was developped around year
2000, and more earlier than later, while at the same time Unicode 3.1 was
issued 2001-05-16, it would be very surprising if it support it).
Now, what I do not know is:
- if the Win32 NLS API has been fully upgraded to Unicode 3.0 for XP. I was
thinking that when I did research it earlier today, since the sizes of the
.NLS files did accordingly increase, but since I did not find the relevant
KB article I was not sure. Michael's approximate answer (I beg your pardon
if this was not the intent) that may lead to think it is an almost-full,
almost-empty pot, is not a very good news
- what is the status with NT 5.2 a.k.a. Server 2003, since I do not have
access right now to this version. A quick look to the size of SORTKEY.NLS
would give some hints: 256 Ki would say it is still at 3.0 level, 768 Ki
(Plane 0, 1 and 2, perhaps with some adjust to cover the delightful plane
14) would be an indication it supports meaningful surrogates without heavy
changes to the scheme, 4352 Ki (4.25 Mi, 17 * 256Ki) would say the
programmer did extend the table without even thinking about how to optimize
it (I do not think it happens, but who knows), and some much smaller size
would mean the algorithm was revised!
- by the way, the same question can be asked with the beta releases of
Longhorn. However, there is not much point trying to nail down the level of
Unicode support of a beta.
This archive was generated by hypermail 2.1.5 : Fri Mar 05 2004 - 13:30:39 EST