Re: Unicode, SMS and year 2012 from Mark Davis ☕ on 2012-04-27 (Unicode Mail List Archive)

From: Mark Davis ☕ <mark_at_macchiato.com>
Date: Fri, 27 Apr 2012 17:28:13 -0700

That is not correct. One of the chief reasons that punycode was selected
was the reduction in size. Tests with the idnbrowser is not relevant. As I
said:

> In that form, it uses a smaller number of
> bytes per character, but a parameterization allows use of all byte
> values.

That is, the parameterization of punycode for IDNA is restricted to the 36
IDNA values per byte, thus roughly 5 bits. When you parameterize punycode
for a full 8 bits per byte, you get considerably different results.

------------------------------
Mark <https://plus.google.com/114199149796022210033>
*
*
*— Il meglio è l’inimico del bene —*
**

2012/4/27 Cristian Secară <orice_at_secarica.ro>

> În data de Fri, 27 Apr 2012 12:26:25 -0700, Mark Davis ☕ a scris:
>
> > Actually, if the goal is to get as many characters in as possible,
> > Punycode might be the best solution. That is the encoding used for
> > internationalized domains. In that form, it uses a smaller number of
> > bytes per character, but a parameterization allows use of all byte
> > values.
>
> I suspect the punycode goal is to take a wide character set into a
> restricted character set, without caring much on resulting string
> length; if the original string happens to be in other character set
> than the target restricted character set, then the string length
> increases too much to be of interest in the SMS discussion.
>
> Just do a test: write something in a non-Latin alphabetic script into
> this page here http://demo.icu-project.org/icu-bin/idnbrowser
>
> Cristi
>
> --
> Cristian Secară
> http://www.secarica.ro
>
>
Received on Fri Apr 27 2012 - 19:30:13 CDT

This archive was generated by hypermail 2.2.0 : Fri Apr 27 2012 - 19:30:13 CDT