RE: UTF-8 to UTF-16LE

From: Jon Hanna (jon@spin.ie)
Date: Tue Jul 08 2003 - 08:22:51 EDT

  • Next message: Peter Kirk: "Re: Yerushala(y)im - or Biblical Hebrew"

    According
    > to XML the
    > default encoding scheme is UTF-8.

    Not strictly true. The default encoding scheme's is UTF-8 *or* UTF-16LE *or*
    UTF-16BE, it's trivial to tell which of these an XML document is in by
    looking at the first few bytes, as described in Appendix F of the XML Spec
    <http://www.w3.org/TR/REC-xml#sec-guessing>. You MUST accept all of these to
    comply with the XML spec.

     But I want to convert it in to
    > UTF16-LE.
    > Can anyone tell me how to convert UTF-8 to UTF-16LE .

    Funnily enough that's just what I'm coding right now.
    The encodings are described in Chapter 3 or Unicode, UTF-8 is also described
    RFC 2279 <http://www.ietf.org/rfc/rfc2279.txt> and UTF-16 in RFC 2781
    <http://www.ietf.org/rfc/rfc2781.txt>.

    Sample code abounds in just about every internationalisation library.



    This archive was generated by hypermail 2.1.5 : Tue Jul 08 2003 - 09:12:18 EDT