Re: UTF-8 ill-formed question

From: Doug Ewell <>
Date: Sun, 16 Dec 2012 10:55:30 -0700

I remember Marco's original post in 2002. His intent was to give people
with an actual U+ code point that needed converting—like James Lin ten
years later—a quick way to do so without getting immersed in all the
bit-shifting math.

If this were a routine being run by a computer, or a tutorial on UTF-8,
I would agree that it should have taken loose surrogates into account.
But it's not. It's just a quick manual reference guide, and loose
surrogates are 0.0001% of the real-world problem for users like James.

While I note that Philippe's amended version seems straightforward and
in keeping with Marco's original intent (short and simple), I'd like to
suggest that neither Marco for creating the original guide, nor anyone
else for doing up UTF-16 and UTF-32 versions, nor Otto for reposting
them on the list this week, need to be beaten up any further over this
edge case.

Doug Ewell | Thornton, Colorado, USA | @DougEwell ­ 
Received on Sun Dec 16 2012 - 11:59:51 CST

This archive was generated by hypermail 2.2.0 : Sun Dec 16 2012 - 11:59:52 CST