Re: [unicode] Text in composed normalized form is king, right? Does anyone generate text in decomposed normalized form?

From: Leif Halvard Silli <xn--mlform-iua_at_xn--mlform-iua.no>
Date: Fri, 1 Feb 2013 17:15:37 +0100

suzuki toshiya, Fri, 01 Feb 2013 23:39:56 +0900:
>> Do any programming languages output text in NFD? Does Java? Python?
>> C#? Perl? JavaScript?
>
> It might not be an example you want, recent Mac OS X stores
> the filenames in NFD-derived encoding.
> http://developer.apple.com/library/mac/#qa/qa1173/_index.html

And this can be a problem. E.g. if you create a file called "å.html" or
"й.html" on your mac and link to that file e.g. from a html page, then
often the URI end up using normalized text. This typically works in
Apache on Mac OS X. But might not work online. The best is to normalize
the file name during upload.

Leif Halvard Silli

> Costello, Roger L. wrote:
>> Hi Folks,
>>
>> The W3C recommends [1] text sent out over the Internet be in
>> Normalized Form C (NFC):
>>
>> This document therefore chooses NFC as the
>> base for Web-related early normalization.
>>
>> So why would one ever generate text in decomposed form (NFD)?
>>
>> Do any programming languages output text in NFD? Does Java? Python?
>> C#? Perl? JavaScript?
>>
>> Do any tools produce text in NFD?
>>
>> Should I assume that any text my applications receive will always be
>> normalized to NFC form?
>>
>> Is NFD dead?
>>
>> /Roger
>>
>> [1] http://www.w3.org/TR/charmod-norm/#sec-ChoiceNFC
>>
>>
>
>
Received on Fri Feb 01 2013 - 10:19:55 CST

This archive was generated by hypermail 2.2.0 : Fri Feb 01 2013 - 10:19:56 CST