RE: How-To handle i18n when you don't know charset?

From: Leon Spencer (Leon.Spencer@brightware.com)
Date: Fri Jul 07 2000 - 20:39:35 EDT


When I store my iso-8859-1 data to my MS SQL db,
I can query it and it shows up as expected. When
I go to read it with another software component,
the components read '?' instead of 'ç'.

I can't figure out why. The software component
that reads from the database is a Java application
with a default locale of en_US and charset Cp1252.

Leon

> -----Original Message-----
> From: Mike Brown [mailto:mbrown@corp.webb.net]
> Sent: Friday, July 07, 2000 4:31 PM
> To: Unicode List
> Cc: 'unicode@unicode.org'
> Subject: RE: How-To handle i18n when you don't know charset?
>
>
> 11digitboy wrote, in a gratuitously quoted contribution:
> > > > > Now you take the case of my friend M. LebÅ"uf,
> > [...]
> > Over here, his name looks like garbage.
> > What is that? Ell ee bee something something you
> > eff.
>
> The message headers on his email included:
>
> Content-Type: text/plain; charset=UTF-8
>
> Apparently your email reading software decided to ignore this
> and it showed
> you the bytes of the message interpreted as ISO-8859-1, windows-1252,
> MacRoman, or some such single-byte encoding. The "something
> something" is,
> at a lower level, 2 bytes that, in UTF-8, mean the single
> character known in
> Unicode as LATIN SMALL LIGATURE OE, or U+0153. It should look like "o"
> joined to "e". In windows-1252 it's at 0x9C. In MacRoman, 0xCF.
>
> Interestingly, your own message's headers included this humorous line:
>
> X-Bloated-Content-Warning: Quotational content of 85% far exceeds
> recommended daily dosage
>
> - Mike
> ____________________________________________________________________
> Mike J. Brown, software engineer at My XML/XSL resources:
> webb.net in Denver, Colorado, USA http://www.skew.org/xml/
>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:05 EDT