Google is a major U+3070 U+304B (was: Re: Searchable web page ?!!)

From: 11digitboy@bolt.com
Date: Sat May 05 2001 - 07:16:23 EDT


*** JUUICHIKETAJIN ***

___________________________________________________________________
Get your own FREE Bolt Onebox - FREE voicemail, email, and
fax, all in one place - sign up at http://www.bolt.com


attached mail follows:



----- Original Message -----
From: <11digitboy@bolt.com>
To: "James Kass" <jameskass@worldnet.att.net>
Sent: Friday, May 04, 2001 4:46 PM
Subject: Re: Searchable web page ?!!

> I don't know about other search engines, but the way
> Google seems to handle some charsets seems to make
> me think it is a
>
> U+3070 U+304B

ばか is Japanese for fool or idiot. "Vaca" is pronounced
about the same and is the Spanish word for "cow". Guess
cows aren't very smart. A lot of times on Google, the
description for the page found says something like
"this page contains characters that can't be displayed
in the current character set..." , which is kind of dumb
because all they would have to do at Google is make the
character set Unicode!

>
> and if it case folds, why not kana fold?
> Are search engines sensitive to characters, or only
> byte sequences? I mean, can it tell that -- OK, let's
> pick a good one -- U+304D and SJIS-82AB are the same
> thing?
>
> A big problem might be languages like Greek which use
> the second half of the possible byte list.
>
> Are the search engines smart enough to tell an alpha
> is an alpha is an alpha?
>

I don't know very much about the search engines, and I wonder
if you meant to send this letter to the Unicode list?

With best regards,

James Kass.



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:18:16 EDT