Date: Fri Jan 02 2009 - 19:33:52 CST

    On Fri, Jan 2, 2009 at 7:42 PM, James Kass <> wrote:
    > What does a search engine do when it runs into a Tamil
    > web page encoded using non-standard PUA conventions,
    > such as TUNE?

    Nothing too smart. It doesn't know what language it is, or even how to
    separate words, so simple questions like should don match don't, or
    should don match donut, or should don match @don# (where @ and # are
    equally mysterious PUA code points) are impossible to answer. It could
    be Verdurian ( ,
    for all Google knows. PUA is completely inscrutable in such a
    situation, until it becomes a case like U+0093 and U+0094, where
    everyone knows they're really quotes even though Unicode says
    otherwise ... which is a hideous pain for everyone.

