Re: Chinese Word Breaking

From: Richard Wordingham <>
Date: Wed, 22 Jul 2015 00:33:34 +0100

On Tue, 21 Jul 2015 18:10:14 +0800
gfb hjjhjh <> wrote:

> When you write text in modern Chinese, there will not be any break
> between different words, and thus if you segment characters according
> to the ideographic characters, what being groupped together would
> either be a clausee or a sentence, Or even a whole paragraph if you
> are handling some older text without punctuations.

I had another look at Chinese word breaking algorithms today and saw
that their practical purposes were mostly indexing and machine
translation. Consequently, I suspect that authors have little
incentive to mark word boundaries in the texts they originate. This
differs from the Thai situation where marking word boundaries improves
layout and spell-checking.

Received on Tue Jul 21 2015 - 18:34:53 CDT

This archive was generated by hypermail 2.2.0 : Tue Jul 21 2015 - 18:34:53 CDT