Re: wap and utf-8

From: Yung-Fong Tang (ftang@netscape.com)
Date: Thu Mar 13 2003 - 20:18:56 EST

  • Next message: askq1 askq1: "RE: Need encoding conversion routines"

    Mary McCarter wrote:

    > Hi Friends,
    >
    >
    >
    > My phone (Motorola i550,i30sx,i85,i60c) doesn't show correctly the ๓
    > neither ó and it shows the รณ instead of ๓.

    Is that a LATIN CAPITAL A WITH TILD and a SUPERSCRIPT THREE?
    ISO-8859-1 use 0xc3 to encode LATIN CAPITAL A WITH TILD
    ISO-8859-1 use 0xb3 to encode UPERSCRIPT THREE
    UTF-8 use 0xc30xb3 to encode LATIN SMALL LETTER O WITH ACUTE

    So... it looks like some code treat your UTF-8 as ISO-8859-1
    case #4 in my paper
    http://people.netscape.com/ftang/paper/textintegrity.html

    Why?
    <?xml version="1.0" encoding="ISO-8859-1"?>
    said "ISO-8859-1"

    wml_binary has the \xc3\xb3
    What is wml_binary?
    what encoding are you used to store the the wml? UTF-8 or ISO-8859-1
    if you do a od -x on that wml file. do you see \xf3 on that characters
    or \xc3\xb3 ?

    One possibility is you create the file in UTF-8 but label it as
    ISO-8859-1 . Change the first
    line from
    <?xml version="1.0" encoding="ISO-8859-1"?>
    to
    <?xml version="1.0" encoding="UTF-8"?>
    will fix that

    If you do stored your information in ISO-8859-1 then it could caused by
    the following reason
    1. some code read your xml file and convert it to UTF-8 correctly,
    however, the encoding="iso-8859-1" is also stored with it
    2. that code pass the converted xml to the next module, but it does not
    remove the 'encoding="utf-8"' or change it from 'encoding="ISO-8859-1"'
    to "encoding="UTF-8"' so the next module thought the data is stilled
    stored in UTF-8

    How to fix it?
    1. fix the data- again, change it to encoding="UTF-8" and use UTF-8 to
    store the data in your wml file
    2. fix the code. Add some code which perform the ISO-8859-1 TO UTF-8
    conversion to remove the encoding or change the encoding

    This is a typical "Dobule Conversion" issue mentioned in my paper
    http://people.netscape.com/ftang/paper/textintegrity.html as point 6

    However, i don't believe that is the case. Because if that IS the case,
    then all your environemtn should display as garbage, not just your
    motorola phone and 4.1 similator.

    The real preblem could bein two places-
    1. some code didn't remove/change the xml encoding information even it
    perform charset conversion
    AND
    2. your nokia phone and your 3.1 simulartor (not your (Motorola
    i550,i30sx,i85,i60c) or up.sdk 4.1 simulator) may always ASSUME the
    data as "UTF-8" and may always ignore the mislabel encoding="ISO-8859-1"
    data ....

    The DOUBLE false could cause you to see it well on Nokia phone and 3.1
    simulator. The SINGLE faulse in 1 and the CORRECT behavior in your
    Motorola phone probably will let you see the wrong thing :)

    >
    > The same happen with my up.sdk 4.1 simulator (connected trough my
    > wap-gateway)
    > But a nokia phone shows the ๓ correctly! and my nokia toolkit 3.1
    > simulator show it well, too.
    >
    > I check the wap-gateway code, and I can realize that the wml_binary
    > has the \xc3\xb3 instead of ๓ and it is right, I think so.. because it
    > is the UTF-8 code, but why my phone can't show it correctly.
    >
    > Any idea?
    >
    > I will be grateful with any contribution,
    > Thanks a lot and Regards,
    > Mary
    >
    >
    >
    >
    >
    > _________________________________________________________________
    > MSN 8 helps eliminate e-mail viruses. Get 2 months FREE*.
    > http://join.msn.com/?page=features/virus
    >
    >



    This archive was generated by hypermail 2.1.5 : Thu Mar 13 2003 - 22:26:25 EST