Re: Best practice of using regex on identify none-ASCII email address

From: Mark Davis ☕ <>
Date: Fri, 1 Nov 2013 14:19:13 +0100

I'm not saying that what is sent to the server has to be those bytes; I'm
saying that if we use the convention that punctuation, whitespace, etc gets
escaped, it would allow us to recognize the boundaries of the local part in
plain text.

I think what you mention is part of a more general problem. Let's suppose
that I have an email address where the bytes that the server recognizes for
the local part are <61 B3> I convert that using Latin-14 to aġ@ I send it in an email to you, and you receive it as UTF-8. You see
aġ, but underneath the covers it is bytes <61 C4 A1>. But then you
send to the server <61 C4 A1>, and it fails. Or worse yet, reaches
someone whose email is aġ (Ok, I could have poked around and
found a more compelling example, but you see the point).

If I really wanted to be absolutely certain that my email wouldn't be
munged by a conversion, I'd never convert from bytes: we'd never see "", we'd always see the equivalent of

Mark <>
*— Il meglio è l’inimico del bene —*

On Fri, Nov 1, 2013 at 1:36 PM, Philippe Verdy <> wrote:

> 2013/11/1 Mark Davis ☕ <>
>> These are two well-known serious flaws in EAI and URLs; there is no
>> useful syntactic limit on what is in the query part of a URL or on the
>> local part of an email address that would allow their boundaries to be
>> detected in plaintext.
>> No use complaining about them, because people are concerned with
>> backwards compatibility, and wouldn't change the underlying specs.
>> That being true, I wish that industry could come to consensus about
>> requiring everything outside of a well-defined, backwards-compatible set of
>> characters to be expressed as UTF-8 percent-escaped characters in these
>> fields when they are expressed as plaintext. (Something like XID_Continue ±
>> exceptions.) That would allow for unambiguous parsing in plaintext.
> Why "UTF-8" only ? There exists already email accounts created with
> various ISO8859-* or windows codepages, or KOI-8R (or U). And none of these
> addresses are aliased with an UTF-8 encoded account name reaching the same
> mailbox (creting these aliases would help these users having such accounts
> to protect their privacy, however there may exist rare cases where these
> aliases woulda conflict with distinct mail accounts
Received on Fri Nov 01 2013 - 08:22:00 CDT

This archive was generated by hypermail 2.2.0 : Fri Nov 01 2013 - 08:22:02 CDT