From: Dominikus Scherkl (lyratelle@gmx.de)
Date: Wed Jun 08 2005 - 04:50:32 CDT
This Message was intended to go to the whole list (my fault):
> > > consider the case of the non-breaking space (U+00A0) which may
> > > follow lots of uppercase ISO 8859-1 Letters (U+00C0..U+00DF).
> >
> > Remember that Lasse's idea is to check _all_ the text; so
> > while NBSP certainly can occur after an capital accentuated
> > letter (or an eszet)
>
> But Uppercase accented letters fortunately do not often
> occure at the end of words, do they? Only ß (eszet, U+00DF)
> is likey to occure before NBSP often, because it's a common
> word-ending in german, but DF A0 to DF BF in UTF-8 means
> U+07E0 to U+7FF, thus far unassigned codepoints (in the near
> future a N'Ko letters), that are realy unlikey to occure in
> the middle of german words.
>
> More of a Thread is 'Â' (C2) followed by some punctuation
> like NBSP (A0), '«' (AB) '»' (BB), '¿' (BF) or '¡' (A1),
> which stand for themthelves in UTF-8. So Words ending in 'Â'
> may be missinterpreted by simply swallowing the letter. This
> may be realy hard to detect. But as stated above, uppercase
> accented letters are very uncommon word endings, and text
> containing accented letters are very, verys unlikely to
> contain them _only_ in such uncommon positions.
>
> Best Regards.
>
> --
> Dominikus Scherkl
>
This archive was generated by hypermail 2.1.5 : Wed Jun 08 2005 - 04:52:16 CDT