From: Chris Jacobs (chris.jacobs@freeler.nl)
Date: Tue Jul 12 2005 - 14:40:43 CDT
----- Original Message -----
From: "Avraham Shapiro" <asha@loc.gov>
To: <unicode@unicode.org>
Sent: Tuesday, July 12, 2005 7:44 PM
Subject: character entities in UTF-8 files
> ** Low Priority **
>
> We have an XML based application that specifies UTF-8 files as input.
> Occasionally users will
> include numeric character entites, for example é for e acute instead
> of the UTF-8
> equivalent of C3 A9. My question is: Is this legal UTF-8?
Perfectly legal.
Only it does not stand for e acute, as far as unicode is involved it just
stands for itself, for é.
Of course you are allowed to have agreements with your users about replacing
é by e acute or by whatever you want to replace it by.
Just like you can agree with them to convert lower case to capitals.
> And are numeric or symbolic character
> entites valid for Ascii-7 characters such as "<"? My guess is the first
> one is not legal,
> and the second one is application defined, i.e. Unicode says nothing about
> it. Am I right?
No. For those the situation is just the same as for the chars above 7F.
Entities in an UTF-8 file stand for the chars they are composed of, not
for the chars they denote.
This archive was generated by hypermail 2.1.5 : Tue Jul 12 2005 - 14:44:54 CDT