Re: WG: UTF-8 text files

From: Jon Hanna (
Date: Wed Jun 08 2005 - 05:15:12 CDT

  • Next message: Antoine Leca: "Re: controll sequences"

    Dominikus Scherkl wrote:
    >>>Remember that Lasse's idea is to check _all_ the text; so
    >>>while NBSP certainly can occur after an capital accentuated
    >>>letter (or an eszet)
    >>But Uppercase accented letters fortunately do not often
    >>occure at the end of words, do they? Only (eszet, U+00DF)
    >>is likey to occure before NBSP often, because it's a common
    >>word-ending in german,

    They do if something is written all-uppercase, a situation where the use
    of U+00A0 rather than U+0020 is more likely (out of formatting concerns).

    Still a rare case, but enough to demonstrate that heuristics cannot be
    100% accurate. More likely cases may occur with other encodings.

    Jon Hanna
    "If the wolves come out of the walls, it's all over."
    - Neil Gaiman, _The Wolves in the Walls_

    This archive was generated by hypermail 2.1.5 : Wed Jun 08 2005 - 05:15:51 CDT