Re: extracting words

From: Jungshik Shin (
Date: Tue Feb 13 2001 - 00:11:41 EST

On Sun, 11 Feb 2001, Mark Davis wrote:

> BTW, someone on this thread made this topic out to be even more complex than
> is: that Devanagari and Korean are written without spaces. While that may
> have been the case historically, I believe that the modern text does use
> spaces. Chinese, Japanese and Thai are the main languages written without
> spaces.

As I wrote earlier and you correctly believe, spaces are used to separate
words in Korean text. That has been the case at least since the Korean
Linguistic Society - KLS: Hangul Hakhoe - published the unified rules of
Korean orthography in 1933. This practice of using spaces must have been
predominant well before that because otherwise the Korean Linguistic
Society might not have come up with that. The ortographic standards
of both North and South Korea agree on this point. More details are
available at <> in Korean only. The full text
of various standards at the site - four orthographic standards (KLS :
1933, 1980, North Korea: 1987, South Korea MOE: 1988), transliteration of
foreign words in Hangul(South Korea MOE, 1985), transcrption of Korean in
Roman alphabets - are only available in HWP - one of the most popular word
processors in Korea - format which can be viewed with Namo HWP viewer
for MS-Windows at <>. People
in the US may find that the bottom of each page gets cropped if printed
directly from Namo HWP viewer as they're made for A4 paper. A way around
is print to a file (using a PS printer driver) and use ghostscript to
print (using PDFWriter may do the same trick). If interested, drop me
a line off-line and I'll send a copy either in PDF or PS (resized to
better fit US letter paper if necessary)

Jungshik Shin

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:18 EDT