Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

From: Philippe Verdy via Unicode <unicode_at_unicode.org>
Date: Sun, 14 Oct 2018 23:50:52 +0200

It's also interesting to look at https://tools.ietf.org/html/rfc3501
- which defines (for IMAP v4) another "BASE64" encoding,
- and also defines a "Modified UTF-7" encoding using it, deviating from
Unicode's definition of UTF-7,
- and adding other requirements (which forbids alternate encodings
permitted in UTF-7 and all other Base64 variants, including those used in
MIME/RFC 2045 or SMTP, used in strong relations with IMAP !).

And nothing in RFC 4648 is clear about the fact that it only covers the
encoding of "octets streams" and not "bits streams". It also does not
discuss the adaptation for "Base64" for transport and storage (needed for
MIME, IMAP, but also in HTTP, and in several file/data formats including
XML, or digital signatures).

That RFC 4648 is only superficial, and does not cover everything (even
Unicode has its own definition for UTF-7 and also allows variations).

As we are on this Unicode list, the definition used by Unicode (more in
line with MIME), does not follow at all those in RFC 4648.
Most uses of Base64 encodings are based on the original MIME definition,
and all of them perform new adaptations. (Even the definition of "Base16"
in RFC4648 contradicts most other definitions).

Le dim. 14 oct. 2018 à 21:21, Doug Ewell via Unicode <unicode_at_unicode.org>
a écrit :

> Steffen Nurpmeso wrote:
>
> > Base64 is defined in RFC 2045 (Multipurpose Internet Mail Extensions
> > (MIME) Part One: Format of Internet Message Bodies).
>
> Base64 is defined in RFC 4648, "The Base16, Base32, and Base64 Data
> Encodings." RFC 2045 defines a particular implementation of base64,
> specific to transporting Internet mail in a 7-bit environment.
>
> RFC 4648 discusses many of the "higher-level protocol" topics that some
> people are focusing on, such as separating the base64-encoded output
> into lines of length 72 (or other), alternative target code unit sets or
> "alphabets," and padding characters. It would be helpful for everyone to
> read this particular RFC before concluding that these topics have not
> been considered, or that they compromise round-tripping or other
> characteristics of base64.
>
> I had assumed that when Roger asked about "base64 encoding," he was
> asking about the basic definition of base64.
>
> --
> Doug Ewell | Thornton, CO, US | ewellic.org
>
>
Received on Sun Oct 14 2018 - 16:51:25 CDT

This archive was generated by hypermail 2.2.0 : Sun Oct 14 2018 - 16:51:26 CDT