Re: Polish codepages

From: Piotr Trzcionkowski (ptrzcionkowski@famur.com.pl)
Date: Mon Feb 14 2000 - 20:31:22 EST


> > your own website uses both 8859-2 [...] and UTF-16 [...]
> > there are no 'warnings' that I have to expect the UTF-16 pages ;)
>
> > What warning you are expect? My pages in utf-16 are started with bom code.
>
> For WWW pages, there is a standard way to announce the codepage used,
> cf. <http://www.w3.org/TR/REC-html40/charset.html>, particularly
> <http://www.w3.org/TR/REC-html40/charset.html#h-5.2.2>. You would be well
> advised to comply with standards,

Check your browser. As I know, all my pages in utf-16 have a proper meta declaration.

>so every browser would interpret your
> pages as intended.

It's not possible ! Only a modern web browsers support unicode.

> Another advice: check your pages with <http://validator.w3.org/> for
> compliance with HTML syntax. Again, you should comply with standards,
> so every browser would interpret your pages as intended.

Excerpts from validating http://www.trzcionk.priv.pl/

--------------------------------------------------------------------------------

Below are the results of attempting to parse this document with an SGML parser.

  a.. Line 1, column 1:
   <html>
   ^
  Error: Missing DOCTYPE declaration at start of document (explanation...)

--------------------------------------------------------------------------------
My page is in html v. 1.0 :-)
--------------------------------------------------------------------------------

  a.. Line 1, column 0:
  t;html>
  ^
  Error: character data is not allowed here

--------------------------------------------------------------------------------

It is just BOM code. Without this code a browser will treat document as 8-bits stream wide, so it can't read my 16-bits (utf-16) documents.

--------------------------------------------------------------------------------

  a.. Line 1, column 3:
  t;html>
     ^
  Error: non SGML character number 0

--------------------------------------------------------------------------------

My page hasn't in this place any ";" or #0 :-)

--------------------------------------------------------------------------------

  a.. Line 1, column 5:
  t;html>
       ^
  Error: non SGML character number 0

--------------------------------------------------------------------------------

Really ? :-)))))

--------------------------------------------------------------------------------

 
  a.. Line 1, column 7:
  t;html>
         ^
  Error: non SGML character number 0

(cutted)

--------------------------------------------------------------------------------

It means, problem is in validator, not "html syntax". Authors of this validator can't imagine that someone wants utf-16.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:59 EDT