From: Peter Kirk (peter.r.kirk@ntlworld.com)
Date: Wed Aug 06 2003 - 07:19:29 EDT
On 06/08/2003 03:54, Philippe Verdy wrote:
>On Wednesday, August 06, 2003 1:59 AM, Curtis Clark <jcclark@mockfont.com> wrote:
>
>
>
>>on 2003-08-05 15:31 Peter Kirk wrote:
>>
>>
>>>Thank you, Mark. This helps to clarify things, but still doesn't
>>>explicitly answer my question of how to encode "a sentence like "In
>>>this language the diacritic ^ may appear above the letters ...",
>>>but instead of ^ I want to use a combining character" and want to
>>>display exactly one space before the combining character - do I
>>>encode two spaces or one?
>>>
>>>
>>In this language the diacritic ĚŠ may appear above the letters...
>>
>>Two spaces, at least in Thunderbird Mail.
>>
>>
>
>The NFD decompositions of spacing marks is alredy defined as a SPACE
>plus a non-spacing combining character. ...
>
Really? It looks to me as if U+00B4 and U+02D8 to U+02DD have only a
compatibility equivalences to space plus diacritic, and U+005E and
U+0060 don't even have compatibility equivalences.
>...
>This means that an algorithm like normalization of whitespace sequences
>in XML or HTML should not include SPACEs that are used as base
>characters in a combining sequence, and so it should keep two spaces
>if the intent is to encode a logical space followed by a logical spacing
>diacritic. (This is not a problem for XML which processes strings in their
>NFC form).
>
>
>
It is, because there are very many combining marks which do not have
spacing equivalents (even for compatibility), and so with these the NFC
form will certainly be space plus diacritic.
-- Peter Kirk peter.r.kirk@ntlworld.com http://web.onetel.net.uk/~peterkirk/
This archive was generated by hypermail 2.1.5 : Wed Aug 06 2003 - 08:01:11 EDT