RE: Definitions

From: Philippe Verdy (
Date: Wed Nov 26 2003 - 05:29:37 EST

  • Next message: Arcane Jill: "RE: Compression through normalization" wrote:
    > Briefly, it's my opinion that applications which claim to support
    > and comply with Unicode should not 'step on' Unicode text. Any
    > loopholes in the 'letter of the law' which allow applications to
    > mung or reject Unicode text should be plugged.

    If this "pluging" request must be done, it should be also the case for HTML
    and XML.
    For now, combining characters can be encoded directly just after a quote
    character (single or double) used to mark the beginning of an attribute
    value, or just after a tag-closing ">". HTML and XML parsers will parse
    these quotes or superior signs by ignoring the combining sequence, creating
    defective sequences, but this is a problem.

    My opinion is that HTML and XML parsers should not take the quote and
    superior sign isolately without considering the whole combining sequence.
    This means that such occurences should be considered as syntax errors. If
    one really wants to create a Unicode-compliant XML/HTML document containing
    defective sequences, these sequences should be encoded with character

    A XML/HTML code generator that generates a serialized document should then
    know the list of combining characters, and encode them with numeric entities
    when their use is defective (at the beginning of a CDATA section, or of an
    attribute value, or of a text element... This would completely "plug the

    << ella for Spam Control >> has removed Spam messages and set aside
    Newsletters for me
    You can use it too - and it's FREE!

    This archive was generated by hypermail 2.1.5 : Wed Nov 26 2003 - 06:04:12 EST