Re: issues storing ZWSP in docs, files and databases

From: Javier SOLA (
Date: Mon Aug 27 2007 - 23:04:41 CDT

  • Next message: Asmus Freytag: "Re: Apostrophes at"


    It is simpler. What you need is a keyboard that facilitates entering the
    ZWSP. The standard Khmer Unicode keyboard has ZWSP in the spacebar (it
    is used 16% of the time), and the space is entered with SHIFT+SPACEBAR,a
    s it is used much less.. but it is still used (it has the meaning of a
    comma). You can store ZWSP normally with UTF8, as one more character,
    differentiated from space.

    Editors like OpenOffice allow you to see the ZWSP if you select seeing
    hidden characters. MS Word also allows the same, but they use a very
    wide character to represent the ZWSP, and the whole text becomes very
    hard to read.

    The second most used character in Khmer is the COENG character
    (invisible virama like character), used to build subjoined characters
    (8%). It is also placed in a very easy to type position (letter J).


    Philippe Verdy wrote
    > It seems that regular spaces are sometimes seen as word separators in
    > Burmese, simply because they are easier to enter and edit correctly (as they
    > are visible).
    > Text editors do not always replace automatically the regular space between
    > two Burmese letters by ZWSP, but this may be automated. Or it may even not
    > be replaced at all when saving the text, leaving that transformation left
    > for the time where the text will be used, in which case, the spaces may be
    > even stripped out completely for rendering...
    > Well, I suppose the fonts supporting Burmese correctly are so much rare,
    > that they will certainly contain an explicit mapping for ZWSP, so that
    > Burmese texts canbe stored directly with ZWSP (using automatic replacement
    > of SPACE between Burmese letters when saving, and automatic replacement of
    > ZWSP between Burmese letters by regular SPACE when loading before editing
    > again).
    > A Burmese Word processor could use two modes :
    > * one that eases editing, where ZWSP are made visible like if they were
    > SPACE instances despite the SPACE bar is allowed and internally inserts a
    > SPACE that is replaced automatically and internally by a ZWSP between two
    > Burmese letters. The internal backing buffer will then contain only ZWSP.
    > * one for the WYZIWYG mode (or "Print Preview" mode), where the same ZWSP
    > are invisible (but the SPACE bar still works the same way, only the
    > rendering is different)
    > * in both modes, the spell checker may automatically signal to the user that
    > there are some positions in the backing buffer that still contains regular
    > SPACE (still visible in both modes), and a way to force the input of a
    > regular SPACE using key sequence like <Ctrl+SPACE bar> even if this is
    > incorrect (disabling the automatic substitution of the SPACE it generates on
    > input, and marking this SPACE instance as explicitly desired, using some
    > out-of-band style instruction if the document is saved in a rich-text
    > format, this information being lost if the document is saved in plain-text
    > format only and loaded again where the spellchecker will signal these
    > spaces).
    > A plain-text only editor will just use the visible mode on screen, but will
    > try, when printing, to remove these regular SPACES between Burmese letters
    > when these's no line-breaking, or replace them by newline markers (this
    > replacement will not affect the edit backbuffer, only what is sent to the
    > print processor when preparing the plain-text document for printing).
    >> -----Message d'origine-----
    >> De : [] De la
    >> part de Doug Ewell
    >> Envoyé : dimanche 26 août 2007 01:56
    >> À : Unicode Mailing List
    >> Cc : Ngwe Tun
    >> Objet : Re: issues storing ZWSP in docs, files and databases
    >> Ngwe Tun wrote:
    >>> We have to use ZWSP for the word breaking in our language. So, We need
    >>> to use ZWSP for line breaking purpose too. Every Burmese word might
    >>> follow ZWSP when automatically adding or operator.
    >>> Please let me have last clarification. Do We need to store ZWSP in
    >>> documents, files and database for the purpose of word
    >>> segmentation/breaking? Or Is it possible to add automatically in
    >>> others way?
    >> Burmese text will either have ZWSP between words, which means electronic
    >> processes can automatically determine word boundaries, or it will not,
    >> which means they cannot. Unicode does not tell you that you must use
    >> ZWSP in Burmese text, only that "if word boundary indications are
    >> desired" then ZWSP is the right character for the job.
    >> A program could probably be written to add ZWSP to existing Burmese
    >> text. Such a program would almost certainly be dictionary-based and
    >> would need to allow a human to review the text and fix any possible
    >> erorrs or ambiguities.
    >> --
    >> Doug Ewell · Fullerton, California, USA · RFC 4645 · UTN #14

    This archive was generated by hypermail 2.1.5 : Mon Aug 27 2007 - 23:07:33 CDT