RE: Several BOMs in the same file

From: Lars Kristan (lars.kristan@hermes.si)
Date: Mon Mar 24 2003 - 03:35:40 EST

  • Next message: Eric Rasmussen: "Re: CJK question"

    Michael (michka) Kaplan wrote:
    > But if you do not, what is the harm of a character that you cannot see
    > and which does not even have width or cause line breaking behavior?
    > Realistically, what would the problem be?

    The fact that the 0xFEFF character will not affect the display does not mean
    that there is no problem with it. If 0xFEFF is treated as a no-break space,
    then it is not a whitespace character. I haven't checked the Unicode
    standard, but I believe that my statement is true. Microsoft also changed
    the behavior of isspace() function not so long ago (now returns false for
    0xFEFF).

    This is all correct, since no-break space (regardless of its width) is a
    'character', not a space. However, this becomes a nuisance. If an
    application fails to remove the BOM prior to processing the contents of the
    file, or another application concats two files and does not remove the BOM
    of the second file, then parsing the file yields wrong results. 0xFEFF
    becomes a part of a word, and that word will no longer qualify as a keyword
    or, if it is an identifier, it does not match the original identifier (for
    example username).

    Regards,

    Lars



    This archive was generated by hypermail 2.1.5 : Mon Mar 24 2003 - 04:15:39 EST