From: pepe pepe (firstname.lastname@example.org)
Date: Mon Nov 17 2003 - 09:10:37 EST
My knowledge about encoding is very poor and you seem to know a lot abou
this. could you explain a bit more what you have said. I have made the
This is the problematic sequence 11110011-01101110-00100000-01001101
(F3-6e-20-4d) if I follow the instructions that appaear in the question(What
is UTF-8?) in the UTf-8 fAQ i obtain the following
011101110100000001101 instead 1EE80D 111101110100000001101(Have I made a
mistake?) Following the utf-16 encoding from my result all works well. so to
finalize who do you think that is the responsible for this strange situation
the client for saying that the doc is utf-8 or the parser.
>From: Pim Blokland <email@example.com>
>To: Unicode mailing list <firstname.lastname@example.org>
>Subject: Re: Problems encoding the spanish o
>Date: Mon, 17 Nov 2003 13:26:19 +0100
>pepe pepe schreef:
> > We have the following sequence of characters "...ización Map.."
> > the same than "...ización Map..." that after suffering some
> > transformations becomes to "...izaci�&56333;ap...."
> > AS you can see the two characters 56186 and 56333 seem to
> > sequences "ón M". Any idea?.
>Yes, your input text obviously gets flagged as being in UTF-8
>format, even if it is Latin-1 (or any codepage that has a ó at index
>Not only that, but the process making the mistake of thinking it is
>UTF-8 also makes the mistake of not generating an error for
>encountering malformed byte sequences, AND of outputting the result
>as two 16-bit numbers instead of one 21-bit number.
>If you take the byte sequence (hex) F3 6E 20 4D and treat it as
>UTF-8 and don't care it's not valid, this maps to the value
>(hex)1EE80D. Again, not caring this is not a valid codepoint,
>turning this into UTF-16 would yield U+DB7A U+DC0D, which is what
>you got in your output.
Dale rienda suelta a tu tiempo libre. Encuentra mil ideas para exprimir tu
ocio con MSN Entretenimiento. http://entretenimiento.msn.es/
This archive was generated by hypermail 2.1.5 : Mon Nov 17 2003 - 10:10:36 EST