Re: Unicode Search Engines

From: John Cowan (cowan@mercury.ccil.org)
Date: Wed Feb 20 2002 - 13:40:30 EST

Previous message: John Cowan: "Re: Unicode Search Engines"
Maybe in reply to: Stefan Probst: "Re: Unicode Search Engines"
Next in thread: Marco Cimarosti: "RE: Unicode Search Engines"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Marco Cimarosti scripsit:

> How does this happen, in practice? Is it the author(ing tools) of HTML
> documents or web servers which do normalization?

N11n should be done as close to the creator as possible: i.e. in
authoring tools. This is all futures as yet.

> And what about documents not in Unicode? Should they be converted into
> Unicode, normalized, and then converted back into the original encoding?

Documents not in UTF-* are normalized by definition, unless it is
*impossible* to convert them to normalized Unicode (typically
because they contain characters not yet present in Unicode).

> I think
> that there is a list of Unicode characters which are not allowed (forbidden?
> deprecated?) in HTML specs.

Correct.

-- 
John Cowan           http://www.ccil.org/~cowan              cowan@ccil.org
To say that Bilbo's breath was taken away is no description at all.  There
are no words left to express his staggerment, since Men changed the language
that they learned of elves in the days when all the world was wonderful.
        --_The Hobbit_

Previous message: John Cowan: "Re: Unicode Search Engines"
Maybe in reply to: Stefan Probst: "Re: Unicode Search Engines"
Next in thread: Marco Cimarosti: "RE: Unicode Search Engines"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Wed Feb 20 2002 - 13:15:46 EST