Re: Unicode in web pages

From: Stephen Toner (toners5@hotmail.com)
Date: Tue Sep 05 2000 - 03:38:59 EDT


Does that mean that inputted code from a web-page must be changed from its
UTF-8 encoding to UCS-2 for storage in SQL server? If so are there any
converters out there?
Can UCS-2 be used as the encoding for a web-page, or must conversion be done
between the two encodings.

>From: "Michael \(michka\) Kaplan" <michka@trigeminal.com>
>To: "Unicode List" <unicode@unicode.org>
>Subject: Re: Unicode in web pages
>Date: Mon, 4 Sep 2000 10:48:29 -0800 (GMT-0800)
>
>Yep, a question mark is the character that Windows will replace any
>character with that is not on the code page being used for conversion.
>Since
>you should be in UTF-2 for most of the time (both SQL Server and Java use
>it, right?), it would be the conversion that was supposed to be happening
>to
>get it to UTF-8. Some other code page is being used, like the server
>default?
>
>michka
>
>
>----- Original Message -----
>From: "Mark Davis" <markdavis@ispchannel.com>
>To: "Unicode List" <unicode@unicode.org>
>Cc: "Unicode List" <unicode@unicode.org>
>Sent: Monday, September 04, 2000 10:32 AM
>Subject: Re: Unicode in web pages
>
>
> > Sounds like somewhere in the process bytes are getting interpreted as
>the
>wrong
> > character set. For example, if you take a Unicode source, convert to
>cp1252,
> > then convert to UTF-8, you will get question marks on Windows or in Java
>for the
> > characters above FF, while the ones below (including some European ones)
>will be
> > correct UTF-8 characters.
> >
> > Mark
> >
> > BTW, there is a FAQ page on the Unicode site
> > (http://www.unicode.org/unicode/faq/) about web pages. I am wondering
>whether
> > you looked at it, and if so whether you found it useful. Feedback would
>help to
> > improve those pages.
> >
> > Stephen Toner wrote:
> >
> > > The character is posted in a form, and the recieving page opens a
>connection
> > > to a SQL Server 7.0 database using the Weblogic JDBC:ODBC driver which
> > > supports unicode. The java sting is then passed to the database.
> > >
> > > I have now found that the symbols in the database where indeed the
>UTF-8
> > > version of the characters eg = . This was for some European
>characters
> > > only.
> > > However many characters in languages such as Japanese (and the Euro
>symbol)
> > > reach the database not in their correct form but with question marks
>in
> > > them. I don't know where the problem is occuring. How does the
>character
> > > get converted into these UTf-8 sequences, and could there be a problem
>with
> > > this - possibly it doesn't recognise the character that it should be
> > > converting (Just a mad stab in the dark)
> > >
> > > Because UTF-8 is a sequence of bytes, does that mean that it could be
> > > treated and stored as ASCII, and that the sequence would be recombined
>to
> > > unicode on output if the encoding was set to UTF-8?
> > >
> > > >From: "Michael \(michka\) Kaplan" <michka@trigeminal.com>
> > > >To: "Unicode List" <unicode@unicode.org>
> > > >Subject: Re: Unicode in web pages
> > > >Date: Mon, 4 Sep 2000 05:04:08 -0800 (GMT-0800)
> > > >
> > > >Well, the client side is right if you are using UTF-8 and the browser
>does
> > > >indeed show UTF-8 as the encoding being used (how to check this
>depends
>on
> > > >your browser -- View|Encoding or Edirt|Preferences), so there must
>be
>some
> > > >issue on the server side.
> > > >
> > > >You may need to post more detail on the database, how you are getting
>to
> > > >it,
> > > >etc. so someone who knows more about the server config can comment.
> > > >
> > > >michka
> > > >
> > > >
> > > >----- Original Message -----
> > > >From: "Stephen Toner" <toners5@hotmail.com>
> > > >To: <michka@trigeminal.com>; <unicode@unicode.org>
> > > >Sent: Monday, September 04, 2000 7:12 AM
> > > >Subject: Re: Unicode in web pages
> > > >
> > > >
> > > > > I am using JSP on the server side, and am using the TomCat server.
> > > > >
> > > > >
> > > > > >From: "Michael \(michka\) Kaplan" <michka@trigeminal.com>
> > > > > >Reply-To: "Michael \(michka\) Kaplan" <michka@trigeminal.com>
> > > > > >To: "Stephen Toner" <toners5@hotmail.com>, "Unicode List"
> > > > > ><unicode@unicode.org>
> > > > > >Subject: Re: Unicode in web pages
> > > > > >Date: Mon, 4 Sep 2000 04:57:18 -0700
> > > > > >
> > > > > >UTF-8 is indeed the characterset you want to use for the page
>encoding;
> > > > > >although some browsers will support UTF-16, etc., not all will.
> > > > > >
> > > > > >But the real issue has to do with what technology you are using
>to
> > > >connect
> > > > > >to the db. Is it ASP on the server side? Or something else? And
>what is
> > > >the
> > > > > >server?
> > > > > >
> > > > > >michka
> > > > > >
> > > > > >
> > > > > >----- Original Message -----
> > > > > >From: "Stephen Toner" <toners5@hotmail.com>
> > > > > >To: "Unicode List" <unicode@unicode.org>
> > > > > >Sent: Monday, September 04, 2000 4:21 AM
> > > > > >Subject: Unicode in web pages
> > > > > >
> > > > > >
> > > > > > > Hi,
> > > > > > > I'm fairly new to unicode and have a few problems trying to
>input it
> > > > > >from
> > > > > >a
> > > > > > > brower.
> > > > > > > I need to take input from a web-page, and store it in a
>database.
> > > >Web
> > > > > >pages
> > > > > > > are then driven from this database. We want to use unicode to
>allow
> > > > > > > multi-lingual support. I was wondering if anyone could tell
>me
>of
> > > >any
> > > > > > > issues likely to be faced in this process.
> > > > > > > Our database is capable of storing unicode, but I'm not sure
>if
>what
> > > >is
> > > > > > > reaching the database is actually unicode. Using IE 5.5, a
>textarea
> > > >in
> > > >a
> > > > > > > form is submitted containing any entered text. I have tried
> > > >specifying
> > > > > >the
> > > > > > > page's character set as UTF-8. What then reaches the database
>is a
> > > > > >series
> > > > > > > of ASCII values with foreign characters such as Japanese, or
> > > >accented
> > > > > > > characters, converted to a few symbols. I don't know if this
>is
> > > > > >unicode,
> > > > > > > where when I look at it in the database the multi-byte
>characters
> > > >can
> > > >be
> > > > > > > seen as a combination of single byte (gibberish) characters.
> > > > > > > If this isn't unicode do I need to put in some sort of
>converter
>to
> > > > > >change
> > > > > > > to &#xxxx; format? Some web sites seem to say that for html,
> > > >unicode
> > > > > >must
> > > > > > > be changed to this numeric character reference format.
> > > > > > > I would appreciate any advice.
> > > > > > > Thanks in advance,
> > > > > > > Stephen
> > > > > > >
> > > > >
> > > >
> >_________________________________________________________________________
> > > > > > > Get Your Private, Free E-mail from MSN Hotmail at
> > > > > >http://www.hotmail.com.
> > > > > > >
> > > > > > > Share information about yourself, create your own public
>profile
>at
> > > > > > > http://profiles.msn.com.
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > >
> >_________________________________________________________________________
> > > > > Get Your Private, Free E-mail from MSN Hotmail at
> > > >http://www.hotmail.com.
> > > > >
> > > > > Share information about yourself, create your own public profile
>at
> > > > > http://profiles.msn.com.
> > > > >
> > > > >
> > > >
> > >
> > >
>_________________________________________________________________________
> > > Get Your Private, Free E-mail from MSN Hotmail at
>http://www.hotmail.com.
> > >
> > > Share information about yourself, create your own public profile at
> > > http://profiles.msn.com.
> >
> >
>

_________________________________________________________________________
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com.

Share information about yourself, create your own public profile at
http://profiles.msn.com.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:13 EDT