Re: character entities in UTF-8 files

From: Chris Jacobs (chris.jacobs@freeler.nl)
Date: Tue Jul 12 2005 - 14:40:43 CDT

  • Next message: Peter Constable: "RE: character entities in UTF-8 files"

    ----- Original Message -----
    From: "Avraham Shapiro" <asha@loc.gov>
    To: <unicode@unicode.org>
    Sent: Tuesday, July 12, 2005 7:44 PM
    Subject: character entities in UTF-8 files

    > ** Low Priority **
    >
    > We have an XML based application that specifies UTF-8 files as input.
    > Occasionally users will
    > include numeric character entites, for example &#233; for e acute instead
    > of the UTF-8
    > equivalent of C3 A9. My question is: Is this legal UTF-8?

    Perfectly legal.

    Only it does not stand for e acute, as far as unicode is involved it just
    stands for itself, for &#233;.

    Of course you are allowed to have agreements with your users about replacing
    &#233; by e acute or by whatever you want to replace it by.
    Just like you can agree with them to convert lower case to capitals.

    > And are numeric or symbolic character
    > entites valid for Ascii-7 characters such as "<"? My guess is the first
    > one is not legal,
    > and the second one is application defined, i.e. Unicode says nothing about
    > it. Am I right?

    No. For those the situation is just the same as for the chars above 7F.
    Entities in an UTF-8 file stand for the chars they are composed of, not
    for the chars they denote.



    This archive was generated by hypermail 2.1.5 : Tue Jul 12 2005 - 14:44:54 CDT