Re: Unicode in web pages

From: Mark Davis (markdavis@ispchannel.com)
Date: Mon Sep 04 2000 - 13:44:34 EDT


Sounds like somewhere in the process bytes are getting interpreted as the wrong
character set. For example, if you take a Unicode source, convert to cp1252,
then convert to UTF-8, you will get question marks on Windows or in Java for the
characters above FF, while the ones below (including some European ones) will be
correct UTF-8 characters.

Mark

BTW, there is a FAQ page on the Unicode site
(http://www.unicode.org/unicode/faq/) about web pages. I am wondering whether
you looked at it, and if so whether you found it useful. Feedback would help to
improve those pages.

Stephen Toner wrote:

> The character is posted in a form, and the recieving page opens a connection
> to a SQL Server 7.0 database using the Weblogic JDBC:ODBC driver which
> supports unicode. The java sting is then passed to the database.
>
> I have now found that the symbols in the database where indeed the UTF-8
> version of the characters eg ็=รง. This was for some European characters
> only.
> However many characters in languages such as Japanese (and the Euro symbol)
> reach the database not in their correct form but with question marks in
> them. I don't know where the problem is occuring. How does the character
> get converted into these UTf-8 sequences, and could there be a problem with
> this - possibly it doesn't recognise the character that it should be
> converting (Just a mad stab in the dark)
>
> Because UTF-8 is a sequence of bytes, does that mean that it could be
> treated and stored as ASCII, and that the sequence would be recombined to
> unicode on output if the encoding was set to UTF-8?
>
> >From: "Michael \(michka\) Kaplan" <michka@trigeminal.com>
> >To: "Unicode List" <unicode@unicode.org>
> >Subject: Re: Unicode in web pages
> >Date: Mon, 4 Sep 2000 05:04:08 -0800 (GMT-0800)
> >
> >Well, the client side is right if you are using UTF-8 and the browser does
> >indeed show UTF-8 as the encoding being used (how to check this depends on
> >your browser -- View|Encoding or Edirt|Preferences), so there must be some
> >issue on the server side.
> >
> >You may need to post more detail on the database, how you are getting to
> >it,
> >etc. so someone who knows more about the server config can comment.
> >
> >michka
> >
> >
> >----- Original Message -----
> >From: "Stephen Toner" <toners5@hotmail.com>
> >To: <michka@trigeminal.com>; <unicode@unicode.org>
> >Sent: Monday, September 04, 2000 7:12 AM
> >Subject: Re: Unicode in web pages
> >
> >
> > > I am using JSP on the server side, and am using the TomCat server.
> > >
> > >
> > > >From: "Michael \(michka\) Kaplan" <michka@trigeminal.com>
> > > >Reply-To: "Michael \(michka\) Kaplan" <michka@trigeminal.com>
> > > >To: "Stephen Toner" <toners5@hotmail.com>, "Unicode List"
> > > ><unicode@unicode.org>
> > > >Subject: Re: Unicode in web pages
> > > >Date: Mon, 4 Sep 2000 04:57:18 -0700
> > > >
> > > >UTF-8 is indeed the characterset you want to use for the page encoding;
> > > >although some browsers will support UTF-16, etc., not all will.
> > > >
> > > >But the real issue has to do with what technology you are using to
> >connect
> > > >to the db. Is it ASP on the server side? Or something else? And what is
> >the
> > > >server?
> > > >
> > > >michka
> > > >
> > > >
> > > >----- Original Message -----
> > > >From: "Stephen Toner" <toners5@hotmail.com>
> > > >To: "Unicode List" <unicode@unicode.org>
> > > >Sent: Monday, September 04, 2000 4:21 AM
> > > >Subject: Unicode in web pages
> > > >
> > > >
> > > > > Hi,
> > > > > I'm fairly new to unicode and have a few problems trying to input it
> > > >from
> > > >a
> > > > > brower.
> > > > > I need to take input from a web-page, and store it in a database.
> >Web
> > > >pages
> > > > > are then driven from this database. We want to use unicode to allow
> > > > > multi-lingual support. I was wondering if anyone could tell me of
> >any
> > > > > issues likely to be faced in this process.
> > > > > Our database is capable of storing unicode, but I'm not sure if what
> >is
> > > > > reaching the database is actually unicode. Using IE 5.5, a textarea
> >in
> >a
> > > > > form is submitted containing any entered text. I have tried
> >specifying
> > > >the
> > > > > page's character set as UTF-8. What then reaches the database is a
> > > >series
> > > > > of ASCII values with foreign characters such as Japanese, or
> >accented
> > > > > characters, converted to a few symbols. I don't know if this is
> > > >unicode,
> > > > > where when I look at it in the database the multi-byte characters
> >can
> >be
> > > > > seen as a combination of single byte (gibberish) characters.
> > > > > If this isn't unicode do I need to put in some sort of converter to
> > > >change
> > > > > to &#xxxx; format? Some web sites seem to say that for html,
> >unicode
> > > >must
> > > > > be changed to this numeric character reference format.
> > > > > I would appreciate any advice.
> > > > > Thanks in advance,
> > > > > Stephen
> > > > >
> > >
> > >_________________________________________________________________________
> > > > > Get Your Private, Free E-mail from MSN Hotmail at
> > > >http://www.hotmail.com.
> > > > >
> > > > > Share information about yourself, create your own public profile at
> > > > > http://profiles.msn.com.
> > > > >
> > > > >
> > > >
> > >
> > >
> >_________________________________________________________________________
> > > Get Your Private, Free E-mail from MSN Hotmail at
> >http://www.hotmail.com.
> > >
> > > Share information about yourself, create your own public profile at
> > > http://profiles.msn.com.
> > >
> > >
> >
>
> _________________________________________________________________________
> Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com.
>
> Share information about yourself, create your own public profile at
> http://profiles.msn.com.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:13 EDT