Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

From: Philippe Verdy via Unicode <unicode_at_unicode.org>
Date: Sat, 13 Oct 2018 16:51:50 +0200

In summary, two disating implementations are allowed to return different
values t and t' of Base64_Encode(d) from the same message d, but both
Base64_Decode(t') and Base64_Decode(t) will be equal and will MUST return
d exactly.

There's an allowed choice of implementation for Base64_Encode() but
Base64_Decode() must then be updated to be permissive/flexible and ensure
that in all cases,
Base64_Decode[Base64_Encode[d]] = d, for every value of d.

The reverse is not true because of this flexibility (needed for various
transport protocols that have different requirements, notably on the
allowed set of characters, and on their maximum line lengths):
Base64_Encode[Base64_Decode[t]] = t may be false.

Le sam. 13 oct. 2018 à 16:45, Philippe Verdy <verdy_p_at_wanadoo.fr> a écrit :

> You forget that Base64 (as used in MIME) does not follow these rules as it
> allows multiple different encodings for the same source binary. MIME
> actually splits a binary object into multiple fragments at random
> positions, and then encodes these fragments separately. Also MIME uses an
> extension of Base64 where it allows some variations in the encoding
> alphabet (so even the same fragment of the same length may have two disting
> encodings).
>
> Base64 in MIME is different from standard Base64 (which never splits the
> binary object before encoding it, and uses a strict alphabet of 64 ASCII
> characters, allowing no variation). So MIME requires special handling: the
> assumpton that a binary message is encoded the same is wrong, but MIME
> still requires that this non unique Base64 encoding will be decoded back to
> the same initial (unsplitted) binary object (independantly of its size and
> independantly of the splitting boundaries used in the transport, which may
> change during the transport).
>
> This also applies to the Base64 encoding used in HTTP transport syntax,
> and notably in the HTTP/1.1 streaming feature where fragment sizes are also
> variable.
>
>
> Le sam. 13 oct. 2018 à 16:27, Costello, Roger L. via Unicode <
> unicode_at_unicode.org> a écrit :
>
>> Hi Folks,
>>
>> Thank you for your outstanding responses!
>>
>> Below is a summary of what I learned. Are there any errors in the
>> summary? Is there anything you would add? Please let me know of anything
>> that is not clear. /Roger
>>
>> 1. While base64 encoding is usually applied to binary, it is also
>> sometimes applied to text, such as Unicode text.
>>
>> Note: Since base64 encoding may be applied to both binary and text, in
>> the following bullets I use the more generic term "data". For example,
>> "Data d is base64-encoded to yield ..."
>>
>> 2. Neither base64 encoding nor decoding should presume any special
>> knowledge of the meaning of the data or do anything extra based on that
>> presumption.
>>
>> For example, converting Unicode text to and from base64 should not
>> perform any sort of Unicode normalization, convert between UTFs, insert or
>> remove BOMs, etc. This is like saying that converting a JPEG image to and
>> from base64 should not resize or rescale the image, change its color depth,
>> convert it to another graphic format, etc.
>>
>> If you use base64 for encoding MIME content (e.g. emails), the base64
>> decoding will not transform the content. The email parser must ensure that
>> the content is valid, so the parser might have to transform the content
>> (possibly replacing some invalid sequences or truncating), and then apply
>> Unicode normalization to render the text. These transforms are part of the
>> MIME application and are independent of whether you use base64 or any
>> another encoding or transport syntax.
>>
>> 3. If data d is different than d', then the base64 text resulting from
>> encoding d is different than the base64 text resulting from encoding d'.
>>
>> 4. If base64 text t is different than t', then the data resulting from
>> decoding t is different than the data resulting from decoding t'.
>>
>> 5. For every data d there is exactly one base64 encoding t.
>>
>> 6. Every base64 text t is an encoding of exactly one data d.
>>
>> 7. For all data d, Base64_Decode[Base64_Encode[d]] = d
>>
>>
Received on Sat Oct 13 2018 - 09:52:21 CDT

This archive was generated by hypermail 2.2.0 : Sat Oct 13 2018 - 09:52:21 CDT