Re: No Invisible Character - NBSP at the start of a word

From: Philippe Verdy (
Date: Fri Nov 26 2004 - 18:28:24 CST

  • Next message: Philippe Verdy: "Re: (base as a combing char)"

    From: "Jony Rosenne" <>
    > One of the problems in this context is the phrase "original meaning". What
    > we have is a juxtaposition of two words, which is indicated by writing the
    > letters of one with the vowels of the other. In many cases this does not
    > cause much of a problem, because the vowels fit the letters, but sometimes
    > they do not. Except for the most frequent cases, there normally is a note
    > in
    > the margin with the alternate letters - I hope everyone agrees that notes
    > in
    > the margin are not plain text.

    Are you making here a parallel with the annotations added on top or below
    ideographs in Asian texts, using the ruby notation (for example in HTML)
    which may also be represented in plain-text Unicode with the interlinear

    Are you arguing that interlinear annotations are not plain-text? If so why
    were they introduced in Unicode?

    The notations in questions are not merely presentation features, they have
    their own semantic which merit being treated as plain-text, because their
    structure also ressembles a linguistic grammar, not far from the other
    common annotations also found in Latin text with phrases between parentheses
    or em-dashes.

    Plain text is widely used since ever to embed several linguistic levels,
    which are also often represented too in the spoken language, by variation of
    tonality. The content of these annotations is also plain text. The graphic
    representation itself is not that important, it is just there to easily
    demonstrate the relations that exist between one level of the written
    language and the annotation language level.

    If a text appears to mix these levels, there's no reason not to represent
    it. These annotations are present in the text, there must be a way to
    represent them in its encoding, even if it implies encoding mixed words
    belonging to different interpretation levels (such as Qere and Ketiv texts
    in Biblic Hebrew).

    You are arguing against millenia of written language practices, just too
    much focused on the common Latin usage where many concessions to your
    intuitive model have already been integrated into Unicode (think about the
    various characters that have been added as symbols or special punctuations,
    or about other annotations added on top of Latin letters such as
    mathematical arrows...

    I see less problems with the correct representation of Ketiv and Qere
    annotations mixed within plain text, and rendered as supplementary letters
    on top or around the core Hebrew letters, than with the representation
    concessed to the Latin script for various usages (including technical
    annotations or punctuations, or formatting controls...)

    This archive was generated by hypermail 2.1.5 : Sat Nov 27 2004 - 14:44:41 CST