RE: represent pound sign ("£”) in UTF-8 encoding

From: Phillips, Addison (addison@amazon.com)
Date: Fri Jun 12 2009 - 10:21:34 CDT

  • Next message: Vinodh Rajan: "Invitation to connect on LinkedIn"

    Hello Surya,

    UTF-8 is a character encoding, that is, it is a mapping between the IDs assigned to characters in a character set (such as Unicode) and the bytes actually used in files or in a computer's memory. UTF-8 is an encoding of the Unicode character set, so it maps the number assigned to each character in Unicode to some sequence of bytes.

    The "pound sign" character is U+00A3, that is, it's "code point" or "Unicode scalar value" (the number assigned by Unicode) is hexadecimal A3 (decimal 163). The UTF-8 encoding maps that to a two-byte sequence C2-A3.

    Okay, now that I've frustrated everyone with that explanation, how do you get this mapping? In XML there are several ways to achieve this:

    1. The easiest way, if you're not familiar with character encodings or not sure if your software can handle UTF-8 or different encodings, is to use a numeric character reference, which is to say, an ASCII sequence. You can replace the character '£' with the string £ (that's ampersand, number sign, 'x' followed by the hex for the character and then a semicolon). If you need other characters than pound sign, you can look them up on the Unicode website or using a tool (such as Windows's "Character Map").

    2. If you use a text editor that supports it, type in the pound sign and save the file as UTF-8. Even very simple editors, such as Notepad, will do this.

    3. If you are using software that generates the XML, be sure that you set it up to generate UTF-8. If you are getting a single byte for each pound sign, your software is probably generating the file as ISO 8859-1 (Latin-1) encoding. Assuming you do not need the full range of Unicode, you can work around this by changing the XML document to use Latin-1 as its encoding (by changing the "UTF-8" in the document's encoding declaration to "ISO-8859-1").

    Hope this helps,

    Addison

    Addison Phillips
    Globalization Architect -- Lab126

    Internationalization is not a feature.
    It is an architecture.

    > -----Original Message-----
    > From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]
    > On Behalf Of Surya Narayana
    > Sent: Thursday, June 11, 2009 11:02 PM
    > To: unicode@unicode.org
    > Subject: represent pound sign ("£”) in UTF-8 encoding
    >
    > Hi All,
    >
    > If anybody know how to represent special characters ( pound sign -
    > "£” ) in XML files with UTF-8 encoding?
    >
    > Question: How to represent pound sign ("£”) in XML invoice with
    > encoding = UTF-8?
    >
    > With regards,
    >
    > Suryanarayana



    This archive was generated by hypermail 2.1.5 : Fri Jun 12 2009 - 10:24:41 CDT