Google is a major U+3070 U+304B (was: Re: Searchable web page ?!!)

Date: Sat May 05 2001 - 07:16:23 EDT


> I don't know about other search engines, but the way
> Google seems to handle some charsets seems to make
> me think it is a
> U+3070 U+304B

ばか is Japanese for fool or idiot. "Vaca" is pronounced
about the same and is the Spanish word for "cow". Guess
cows aren't very smart. A lot of times on Google, the
description for the page found says something like
"this page contains characters that can't be displayed
in the current character set..." , which is kind of dumb
because all they would have to do at Google is make the
character set Unicode!

> and if it case folds, why not kana fold?
> Are search engines sensitive to characters, or only
> byte sequences? I mean, can it tell that -- OK, let's
> pick a good one -- U+304D and SJIS-82AB are the same
> thing?
> A big problem might be languages like Greek which use
> the second half of the possible byte list.
> Are the search engines smart enough to tell an alpha
> is an alpha is an alpha?

I don't know very much about the search engines, and I wonder
if you meant to send this letter to the Unicode list?

With best regards,

James Kass.

