Re: Unicode password mapping for crypto standard

From: gfb hjjhjh <c933103_at_gmail.com>
Date: Tue, 5 Jan 2016 15:19:25 +0800

Hello, I don't have much knowledge on the topic, but 1. probably something
like the punycode used for internationalized domain name might help? 2. I
don't think keyboard mapping is a good idea, as to some less computer-savvy
Chinese-speaking users, it's often that their only way to write Chinese
into computer is by handwriting and handwriting doesn't seem to be
something supported by keyboard mapping.
2016/01/05 13:33 "Sean Leonard" <lists+unicode_at_seantek.com>:

> Hi Unicode list, I am looking for feedback on this proposal, specifically
> a standard specification to map between (presumably) Unicode text strings
> and octet strings.
>
> A "password" is defined as an arbitrary octet string in a number of
> protocols and formats. This has worked for basic cases where the "password"
> is just ASCII, but there are interoperability issues when characters beyond
> ASCII get involved. My observation is that a lot of security folks get
> hand-wavy about the Unicode stuff, which is why there is little
> standardization in this area.
>
> Recently in the IETF, application/pkcs8-encrypted is proposed for the PKCS
> #8 EncryptedPrivateKeyInfo type. For purposes of our discussion, the format
> takes as input an opaque octet string (any octet in the range 00h-FFh, of
> any length), and executes various specified algorithms; the result is a
> decrypted private key. The most common algorithm is PBKDF2, but any
> algorithm can be used (including, for example, a raw symmetric encryption
> algorithm such as AES-256).
>
> PKCS #8 punts on the issue of character encoding. It says that ASCII or
> UTF-8 could be used, but doesn’t enforce anything in particular. PKCS #12
> specifies UTF-16LE with a terminating NULL character (00h 00h).
>
> In the application/pkcs8-encrypted registration, I thought it might be
> wise to allow senders and receivers to specify how input (whether user
> input or otherwise) gets mapped to the octet string, since it's not part of
> the format. Originally my concern at that time was to reflect IANA
> character sets, rather than profiles of Unicode.
>
> These days, however, most user agents are Unicode-enabled and will accept
> user input in Unicode. Therefore, issue is less about legacy character
> sets, and more about how to take the Unicode input and get a consistent and
> reasonable stream of bits out on both ends. For example: should the
> password be case folded, converted to NFKC, encoded in UTF-8 vs. UTF-16BE,
> etc.? Constraining or transforming the input would be helpful for disparate
> systems to agree on these things.
>
>
> Thank you,
>
> Sean
>
> PS I read the "Unicode in passwords" thread. It's relevant. An alternative
> or addition to proposing a mapping to/from Unicode, might be to have a
> "keyboard-mapping" or "keyboard-layout" parameter, that specifies the
> suggested layout of the keyboard (or input device) used for password input,
> preferably by deferring to some international standard on the topic. Such a
> parameter could influence the initial user input method, but it doesn't
> answer the question of how to turn the key presses into specific bits
> (Unicode-based or otherwise).
>
> **********
> The relevant part of the template (most recent proposal, today) is:
> ***
> Optional parameters:
>
> password-mapping:
> When the private key encryption algorithm incorporates a "password" that
> is an octet string, a mapping between user input and the octet string is
> desirable. PKCS #5 [RFC2898] Section 3 recommends "that applications follow
> some common text encoding rules"; it then suggests, but does not recommend,
> ASCII and UTF-8. This parameter specifies the charset that a recipient
> SHOULD attempt first when mapping user input to the octet string. It has
> similar semantics as the charset parameter from text/plain, except that it
> only applies to the user’s input of the password. There is no default value.
>
> The following special values are defined:
> *pkcs12 = UTF-16LE with U+0000 NULL terminator (PKCS #12-style)
> *precis = PRECIS password profile, i.e., OpaqueString from Section 4 of
> RFC 7613 (always UTF-8)
> *precis-XXX = PRECIS profile as named XXX in the IANA PRECIS Profiles
> Registry <https://www.iana.org/assignments/precis-parameters>
> *hex = hexadecimal input: the input is mapped to 0-9, A-F, and then
> converted directly to octets. If there are an odd number of hex digits, the
> final digit 0 is appended, or an error condition may be raised. Compare
> with Annex M.4 of IEEE 802.11-2012.
> *dtmf = The characters "0"-"9", "A"-"D", "*", and "#", which map to
> their corresponding ASCII codes. (This is to support restricted-input
> devices, i.e., telephones and telephone-like equipment.)
>
> Otherwise, the value of this parameter is a charset, from the Character
> Sets Registry <http://www.iana.org/assignments/character-sets>.
> ***
>
> The relevant part of the original template (proposed 2015-11-04) is:
> ***
> Optional parameters:
> charset: When the private key encryption algorithm incorporates a
> “password" that is an octet string, a mapping between user input and the
> octet string is desirable. PKCS #5 [RFC2898] Section 3 recommends "that
> applications follow some common text encoding rules"; it then suggests, but
> does not recommend, ASCII and UTF-8. This parameter specifies the charset
> that a recipient SHOULD attempt first when mapping user input to the octet
> string. It has the same semantics as the charset parameter from text/plain,
> except that it only applies to the user’s input of the password. There is
> no default value.
>
> ualg: When the charset is a Unicode-based encoding, this parameter is a
> space-delimited list of Unicode algorithms that a recipient SHOULD first
> attempt to apply to the Unicode user input in succession, in order to
> derive the octet string. The list of algorithm keywords is defined by
> [UNICODE]. “Tailored operations” are operations that are sensitive to
> language, which must be provided as an input parameter. If a tailored
> operation is called for, the exclamation mark followed by the [BCP47]
> language tag specifies the language. For example, "toNFD
> toNFKC_Casefold!tr" first applies Normalization Form D, followed by
> Normalization Form KC with Case Folding in the Turkish language, according
> to [UNICODE] and [UAX31]. The default value of this parameter is empty, and
> leaves the matter of whether to normalize, case fold, or apply other
> transformations unspecified.
>
>
> The latest template is here:
>
> http://mailarchive.ietf.org/arch/msg/precis/Qil9mc5AtqxXp8OXllp0lAwYts4
>
>
>
Received on Tue Jan 05 2016 - 01:20:40 CST

This archive was generated by hypermail 2.2.0 : Tue Jan 05 2016 - 01:20:41 CST