From: Peter Kirk (firstname.lastname@example.org)
Date: Wed Aug 06 2003 - 07:19:29 EDT
On 06/08/2003 03:54, Philippe Verdy wrote:
>On Wednesday, August 06, 2003 1:59 AM, Curtis Clark <email@example.com> wrote:
>>on 2003-08-05 15:31 Peter Kirk wrote:
>>>Thank you, Mark. This helps to clarify things, but still doesn't
>>>explicitly answer my question of how to encode "a sentence like "In
>>>this language the diacritic ^ may appear above the letters ...",
>>>but instead of ^ I want to use a combining character" and want to
>>>display exactly one space before the combining character - do I
>>>encode two spaces or one?
>>In this language the diacritic ̊ may appear above the letters...
>>Two spaces, at least in Thunderbird Mail.
>The NFD decompositions of spacing marks is alredy defined as a SPACE
>plus a non-spacing combining character. ...
Really? It looks to me as if U+00B4 and U+02D8 to U+02DD have only a
compatibility equivalences to space plus diacritic, and U+005E and
U+0060 don't even have compatibility equivalences.
>This means that an algorithm like normalization of whitespace sequences
>in XML or HTML should not include SPACEs that are used as base
>characters in a combining sequence, and so it should keep two spaces
>if the intent is to encode a logical space followed by a logical spacing
>diacritic. (This is not a problem for XML which processes strings in their
It is, because there are very many combining marks which do not have
spacing equivalents (even for compatibility), and so with these the NFC
form will certainly be space plus diacritic.
-- Peter Kirk firstname.lastname@example.org http://web.onetel.net.uk/~peterkirk/
This archive was generated by hypermail 2.1.5 : Wed Aug 06 2003 - 08:01:11 EDT