Re: No Invisible Character - NBSP at the start of a word

From: Peter Kirk (
Date: Sun Nov 28 2004 - 12:27:33 CST

  • Next message: Philippe Verdy: "Re: Re: Relationship between Unicode and 10646]"

    On 28/11/2004 00:21, Mark E. Shoulson wrote:

    > ...
    > Well, that's the difference under discussion. The "plain text" would
    > seem to be either the qere or the ketiv (but not the combined
    > "blended" form), since each of those is somewhat sensible. Peter
    > Kirk's point is that the blended form is what is in fact written and
    > has been so for centuries, so he claims that *it* should be considered
    > the plain text.
    But who says the plain text has to be sensible? Unicode is not concerned
    with representing the text as written, not with its meaning. The
    following string is meaningless, is not sensible at all, but it is still
    plain text: gxyfcwx bfzkgf ikxz bgcuyxukb kbcghjkshxcbnhjkc b bhb
    jksdfncfuhikc. (It's not a code, by the way, it comes from random typing.)

    Asmus basically agreed with me, but added:

    > In scripts with complex layout, of course, not all random character
    > soup would be rendered the same by all systems. Which, I think is the
    > point here. If this is a rather commonly used device, then in
    > principle it's possible to ask why can this not be part of plain text.
    > If the necessary mechanisms to do this are cheap and simple, the
    > answer is often to bring such things under the plain text umbrella. If
    > it's complicated, the answer should be to leave it to mechanisms such
    > as markup that deal well in (whatever required kind of) complexity.

    If there was in fact a need for complex mechanisms to support Ketiv/Qere
    blended forms in plain text, then I might agree that alternative markup
    mechanisms need to be looked at. But in fact in this case, as I see it,
    only two special mechanisms are required:

    1) Allowing multiple vowel points with a single base character. The
    issues concerning this one were discussed at some length on this list
    last year, concerning the form Yerushala(y)im which is the commonest
    such form. The solution which was agreed for this form works well with
    the other rare forms in this category.

    2) Allowing floating vowel points (and sometimes accents) with a blank
    base character. This usually, but not always, happens at the beginning
    of a word. The mechanism for doing this seems to have been clarified by
    the UTC: use NBSP as the base character.

    So can't we leave it that these mechanisms can be used for
    representation of these forms by those who wish to represent them in
    plain text, whereas those who want to use other mechanisms are free to
    do so?

    In answer to the possible objection that this leaves alternative ways to
    represent the same text, I note that the same alternatives already apply
    with e.g. superscript digits which may be represented either in plain
    text with the Unicode superscript digit characters, or as marked up text
    using superscript markup.

    Peter Kirk (personal) (work)

    This archive was generated by hypermail 2.1.5 : Sun Nov 28 2004 - 19:52:57 CST