Hello Piotr, ich hatte geschrieben: > For WWW pages, there is a standard way to announce the codepage used, > cf. [...] This statement was based on an oversight: Indeed, there is a fundamental problem with declaring the UTF-16 charset in an HTML Meta tag. I will discuss this infra. Am 2000-02-15 um 02:31 h MEZ haben Sie geschrieben: > As I know, all my pages in utf-16 have a proper meta declaration. I was referring to the page you had recommended previously, in the current thread (some Polish-EDP related page from a university site). That one has indeed no META tag and it contains a BOM (hence my over- sight). Only this very morning, you have disclosed your own URL, viz. . Both of these pages do indeed not comply with current HTML standards. Ich hatte weiter geschrieben: > Another advice: check your pages with for > compliance with HTML syntax. Sie hatten weiter geschrieben: > Excerpts from validating http://www.trzcionk.priv.pl/ ... > Error: Missing DOCTYPE declaration at start of document > ------------------------------------------------------- > My page is in html v. 1.0 :-) It is not. It contains elements of HTML 3.2 (which demands ISO 8859-1 encoding), such as background image, and font selection. Only HTML 4.0 allows these elements, and also characters beyond Latin-1; hence, these pages must start with DOCTYPE declarations. > í¾ºí¶¬t;html> > ^ > Error: character data is not allowed here > ----------------------------------------- > It is just BOM code. Without this code a browser will treat document as > 8-bits stream wide, so it can't read my 16-bits (utf-16) documents. ... > Error: non SGML character number 0 > ----------------------------------------- > My page hasn't in this place any ";" or #0 :-) Now, we are at the crucial point. The HTML validation service (an HTTP client, as good as any other) expects 8-bit coded content, so it is not prepared for 16-bit information chunks. Hence, it perceives the BOM as two characters preceding the Html tag (not allowed by HTML syntax), and every UTF-16 character as two characters, one of which is usually the character number 0 (not allowed by SGML, hence a fortiori by HTML syntax). Now the question arises: Is this a legal way of announcing UTF-16 (so the validator should be mended), or is it ruled out by HTTP or HTML? The HTML 4.01 standard says: > The "charset" parameter identifies a character encoding, which is a method > of converting a sequence of bytes into a sequence of characters. Likewise, the HTTP 1.1 standard defines: > The term "character set" is used in this document to refer to a > method used with one or more tables to convert a sequence of octets > into a sequence of characters. Note: both say "bytes" or "octets", not "16-bit units". However, both the HTML and the HTTP standard refer to the IANA registry , which comprises both ISO-10646-UCS-2 and ISO-10646-UCS-4 with the proviso: > this needs to specify network byte order Now, I cannot see how this could work with the HTML Meta tag: To even recognize the Meta tag, the receiver has to know the transfer encoding. Note that the Meta tag scheme (though a kludge) still works with ASCII supersets, such as ISO 8859 and UTF-8, as the Meta tag is entirely in ASCII, hence can be recognized by the client. The only way I can imagine to work is to announce UTF-16 (such as any other non-ASCII-superset) out-of-band, i. e. in the HTTP entity header (content-type field, charset subfield). Indeed, the HTTP 1.1 standard stringently demands, in section 3.7.1: > Data in character sets other than "ISO-8859-1" or its subsets MUST be > labeled with an appropriate charset value. Sie haben weiter geschrieben: > It means, problem is in validator, not "html syntax". Authors of this > validator can't imagine that someone wants utf-16. No. It rather means that the problem is in your non-standard way of announcing UTF-16, and the authors of the validation service do not share your im- pression on how to achieve this goal. Try the service with a properly anounced UTF-16 encoding; if it does not work properly, tell the author so via the e-mail address provided. Note that the HTML validation service at is offered and maintained by the W3C con- sortium, the most knowledgable and authoritative source about WWW standards. This does not preclude a HTTP client to try to make the most of non- standard WWW pages. Hence, the Netscape® Communicator 4.7 on my Unix box actually displays your page alright. However, you do not offer your page to a particular browser, but rather to all HTTP clients, world-wide. This can only work if your page is standard-conforming in every respect. It is definitely not enough if your page displays alright on your own screen, and possibly on a few more screens in your neighbourhod. This is the raison d'être for a validation service as the one mentioned above. Best wishes, Otto Stolz