Re: Using Unicode in XML

From: Markus Scherer (
Date: Mon Jul 17 2000 - 19:55:36 EDT

"Michael (michka) Kaplan" wrote:
> > there, it now also recommends (though it does not force) for xml clients
> to recognize u+feff for utf-8 (the bytes ef bb bf) and many other byte
> combinations.
> > there is a link to the errata at the beginning of the xml spec.
> Where do you see this? The list below has all the errata that mention the
> word encoding in them. E44 Substantive does say that EF BB BF is UT-8, but
> it makes no comment even approaching a recommendation.

what more do you need?
appendix f describes a way for conforming xml processors that support encodings other than utf-8 and utf-16 to autodetect the encoding family. e44 extends the list of initial bytes to include the utf-8 signature.

this all is non-normative, i.e., an xml processor may choose to support only utf-8 and utf-16, and only the latter with the signature (bom), but there is a good way to recognize many more input streams, and it is part of the xml spec. good processors will do this.


This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:05 EDT