RE: No Invisible Character - NBSP at the start of a word

From: Asmus Freytag (
Date: Tue Dec 07 2004 - 05:43:32 CST

  • Next message: Philippe Verdy: "Re: Nicest UTF.. UTF-9, UTF-36, UTF-80, UTF-64, ..."

    At 11:52 PM 12/6/2004, Jony Rosenne wrote:
    >In chapter 8, regarding Hebrew, the standard says:
    >Positioning. Marks may combine with vowels and other points, and there are
    >complex typographic rules for positioning these combinations.
    >I understand that this sentence should be regarded as being normative.

    The aim of Unicode in making normative statements about layout is to allow
    a user (an author) to confidently select the correct character for the
    intended purpose at the correct location and (generally) rely on that
    conforming implementations will render it in a way that's compatible with
    the intent.

    As long as that is the case, the actual rendering may be more or less
    'pretty', according to how sophisticated the layout engine is. After all,
    between a typewriterish rendition of a script and full-featured print there
    can be a wide gulf.

    Unicode was not intended to replace all typographical rules and customs, but
    where there is a situation where it's doubtful to an author whether or not
    a certain character should be placed before or after another character for
    a given representation, that's the situation in which Unicode tends to clarify.

    This process of clarification can be considered an ongoing process, since
    in many cases the fact that there is a need for a clarification may not have
    been apparent from the start in a given case.

    It is certainly also true that some effects are best left to markup (or other
    out of band information). This is certainly true for something as complex
    as fully built-up mathematical equations. However, for layout effects in
    running text, the presumption should be that if they are ordinary and can
    be described in predictable ways that harmonize with other, similar and
    already existing situations, that in those cases out of band information
    should not be required.

    Sometimes, but this does not apply to the case at hand, the same effect can
    both be appropriately applied as markup and via special characters. I'm
    thinking of the letterlike forms used in mathematics and phonetics, many of
    which are identical in shape to what can be produced with markup. However.
    for a variety of reasons it was felt that requiring markup in each instance
    was too limiting.

    Therefore, all of what you quote is eminently so, but the conclusion
    doesn't follow. This is a case where UTC will have to come to (or sustain)
    an explicit judgement, to settle the controversy.


    PS: What I have written about Unicode not intending primarily to be
    prescriptive about the layout, but being interested rather in establishing
    an agreement for both authors and programmers on which sequence of
    characters is used to represent which construct - those statements are
    related to the discussion of character identity and properties, most
    closely developed in TR#23 so far. There might be a reason to add some
    language clarifying these concepts further for 5.0. Suggestions are welcome.

    This archive was generated by hypermail 2.1.5 : Tue Dec 07 2004 - 16:21:39 CST