Re: Best practice of using regex on identify none-ASCII email address from Philippe Verdy on 2013-11-01 (Unicode Mail List Archive)

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Fri, 1 Nov 2013 13:36:53 +0100

2013/11/1 Mark Davis ☕ <mark_at_macchiato.com>

> These are two well-known serious flaws in EAI and URLs; there is no useful
> syntactic limit on what is in the query part of a URL or on the local part
> of an email address that would allow their boundaries to be detected in
> plaintext.
>
> No use complaining about them, because people are concerned with backwards
> compatibility, and wouldn't change the underlying specs.
>
> That being true, I wish that industry could come to consensus about
> requiring everything outside of a well-defined, backwards-compatible set of
> characters to be expressed as UTF-8 percent-escaped characters in these
> fields when they are expressed as plaintext. (Something like XID_Continue ±
> exceptions.) That would allow for unambiguous parsing in plaintext.
>

Why "UTF-8" only ? There exists already email accounts created with various
ISO8859-* or windows codepages, or KOI-8R (or U). And none of these
addresses are aliased with an UTF-8 encoded account name reaching the same
mailbox (creting these aliases would help these users having such accounts
to protect their privacy, however there may exist rare cases where these
aliases woulda conflict with distinct mail accounts
Received on Fri Nov 01 2013 - 07:40:35 CDT

This archive was generated by hypermail 2.2.0 : Fri Nov 01 2013 - 07:40:38 CDT