Re: Unicode, SMS and year 2012

From: Doug Ewell <doug_at_ewellic.org>
Date: Sat, 28 Apr 2012 12:53:17 -0600

Mark Davis 🚙 wrote:

>> I suspect the punycode goal is to take a wide character set into a
>> restricted character set, without caring much on resulting string
>> length; if the original string happens to be in other character set
>> than the target restricted character set, then the string length
>> increases too much to be of interest in the SMS discussion.
>
> That is not correct. One of the chief reasons that punycode was
> selected was the reduction in size.

But certainly the main motivation behind the development of Punycode, or any of the ACEs (ASCII-Compatible Encodings) that came before it, was to provide a compact encoding given the constraints of the set of characters allowed in domain names. The extensibility of the algorithm to target character sets of different sizes was definitely an advantage.

> Tests with the idnbrowser is not relevant. As I said:
>
>> In that form, it uses a smaller number of
>> bytes per character, but a parameterization allows use of all byte
>> values.
>
> That is, the parameterization of punycode for IDNA is restricted to
> the 36 IDNA values per byte, thus roughly 5 bits. When you
> parameterize punycode for a full 8 bits per byte, you get considerably
> different results.

Not to say this isn’t so, but can you point to a tool or site where a user can type a string and see the output with different parameterizations? Pretty much all of the “Convert to Punycode” pages I see are only able to convert to the IDNA target.

--
Doug Ewell | Thornton, Colorado, USA
http://www.ewellic.org | @DougEwell ­
Received on Sat Apr 28 2012 - 13:54:06 CDT

This archive was generated by hypermail 2.2.0 : Sat Apr 28 2012 - 13:54:06 CDT