Re: Unicode Search Engines

From: John Cowan (cowan@mercury.ccil.org)
Date: Wed Feb 20 2002 - 13:40:30 EST


Marco Cimarosti scripsit:

> How does this happen, in practice? Is it the author(ing tools) of HTML
> documents or web servers which do normalization?

N11n should be done as close to the creator as possible: i.e. in
authoring tools. This is all futures as yet.

> And what about documents not in Unicode? Should they be converted into
> Unicode, normalized, and then converted back into the original encoding?

Documents not in UTF-* are normalized by definition, unless it is
*impossible* to convert them to normalized Unicode (typically
because they contain characters not yet present in Unicode).

> I think
> that there is a list of Unicode characters which are not allowed (forbidden?
> deprecated?) in HTML specs.

Correct.

-- 
John Cowan           http://www.ccil.org/~cowan              cowan@ccil.org
To say that Bilbo's breath was taken away is no description at all.  There
are no words left to express his staggerment, since Men changed the language
that they learned of elves in the days when all the world was wonderful.
        --_The Hobbit_



This archive was generated by hypermail 2.1.2 : Wed Feb 20 2002 - 13:15:46 EST