Re: PRI #186: Word-Joining Hyphen vs LEFT SINGLE QUOTATION MARK

From: Per Starbäck <starback_at_stp.lingfil.uu.se>
Date: Tue, 05 Jul 2011 16:03:05 +0200

Philippe Verdy <verdy_p_at_wanadoo.fr> writes:

> In all cases, you need knowledge of the language before trying to
> implement a word-breaker for that language. The solution in UAX#29
> will still provide some basic breaks to reduce the number of cases and
> to more easily detect exceptions, and it can be a good first
> processing step used in actually working word breakers for spell
> checkers, grammatical analysis, and automated translators, and for
> disambiguating leading and trailing apostrophes from leading and
> trailing quotation marks.

I’m off on a tangent here, but I can add that when my native Swedish
uses single quotation marks it traditionally uses RIGHT SINGLE QUOTATION
MARK both before and after a quotation. This is yet another example of
how you need knowledge of the language, and yet another example of how
hard word-breaking can be.

Then if you read

        Jag sÃ¥g ’na när hon spela’ piano.
      = Jag sÃ¥g henne när hon spelade piano.
      = I saw her when she played the piano.

no simple algorithm would know that there isn’t a quote “na när hon
spela†in there.
Received on Tue Jul 05 2011 - 09:06:16 CDT

This archive was generated by hypermail 2.2.0 : Tue Jul 05 2011 - 09:06:18 CDT