Corner cases (was: Re: UTF-16 Encoding Scheme and U+FFFE)

From: Doug Ewell <doug_at_ewellic.org>
Date: Wed, 04 Jun 2014 11:00:50 -0700

How common is it to see any of the following in real-world Unicode text,
as opposed to code charts and test suites and the like?

1. Unpaired surrogates
2. Noncharacters (besides CLDR data)
3. U+FEFF at the beginning of a stream (note: not "packet" or arbitrary
cutoff point)

I'm not asking whether any of these are recommended or "prohibited" or
whether they are a good idea. I'm asking about actual usage.

--
Doug Ewell | Thornton, CO, USA
http://ewellic.org | @DougEwell
_______________________________________________
Unicode mailing list
Unicode_at_unicode.org
http://unicode.org/mailman/listinfo/unicode

Received on Wed Jun 04 2014 - 13:02:09 CDT

This message: [ Message body ]
Next message: Shawn Steele: "RE: Corner cases (was: Re: UTF-16 Encoding Scheme and U+FFFE)"
Previous message: Jukka K. Korpela: "Re: Swift"
Next in thread: Shawn Steele: "RE: Corner cases (was: Re: UTF-16 Encoding Scheme and U+FFFE)"
Reply: Shawn Steele: "RE: Corner cases (was: Re: UTF-16 Encoding Scheme and U+FFFE)"
Maybe reply: Doug Ewell: "RE: Corner cases (was: Re: UTF-16 Encoding Scheme and U+FFFE)"
Maybe reply: Doug Ewell: "Re: Corner cases (was: Re: UTF-16 Encoding Scheme and U+FFFE)"
Maybe reply: Doug Ewell: "Re: Corner cases (was: Re: UTF-16 Encoding Scheme and U+FFFE)"
Maybe reply: Doug Ewell: "Re: Corner cases (was: Re: UTF-16 Encoding Scheme and U+FFFE)"

Mail actions: [ respond to this message ] [ mail a new topic ]
Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]

This archive was generated by hypermail 2.2.0 : Wed Jun 04 2014 - 13:02:09 CDT