UTF-12

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon Jun 21 2010 - 14:15:37 CDT

  • Next message: Kenneth Whistler: "Re: Refining the idea for the SignWriting proposal"

    Yes, this is smart, especially for its exact mapping to Base64, where
    it will even be superior to UTF-8 in many more cases (there should be
    a comparison table of sizes between UTF-8, UTF-16 and UTF-12 in the
    Base64 transport encoding).

    You should also add, somewhere in the last section of your web
    document, that Base64 is not just well suited to 7-bit only
    environments, but as well to many 7-bit and 8-bit environments that
    require MIME compatibility for controls and spaces (notably in
    emails). After all, Base64 was first designed and standardized exactly
    for that purpose.

    All the Base64 variants, as described in:
    http://en.wikipedia.org/wiki/Base64#Variants summary table
    will also be usable in the query string appended to URLs, even though
    the HTML form data submitted in Base64 with the equivalent (default)
    GET method (or with the specified POST method) should only use one of
    the two variants :
    - 'Base64' encoding standardized for RFC 3548 or RFC 4648 (with the
    explicit HTML form element attributes : encoding="base64", and
    method="post")
    - Modified Base64 encoding for URL applications (with the explicit
    HTML form element attributes : encoding="base64url", and the default
    method="get")

    This applies to :
    - all URL query parameters, in a a query string that are enumerated
    and separated by ampersands (&), and then represented as name=value
    pairs or just with unnamed values (there will be no conflict with the
    Base64 variants that use the equal sign for padding, given that no
    Base64 padding is necessary when transporting UTF-12 encoded texts)
    - as well as the other Base64 variants for filenames, or for XML
    Names, or for XML NmTokens, or program identifiers.

    One more question :

    Your page is copyrighted and signed by you (with your email address as
    the contact) ; this is absolutely not a problem (in fact it is a good
    practice for all publications on the web), but there does not seem to
    exist any proposed licence on your page, so the only way to get one
    would be to contact you via your displayed email address.

    Can this specification page be licenced by you in an open or free way
    on this page, possibly dual-licenced under Creative Commons (CC-BY-SA
    : author's attribution required, share-alike) or LGPL (because it
    describes an algorithm, assimilable to library source code that will
    then be freely modifiable and implementable) ?

    -- Philippe.

    On 2010-06-21 at 19:00 CEST, "Andrey V. Lukyanov" <land@long.yar.ru> wrote:
    > As you might guess, UTF-12 is a system for representing Unicode
    > characters with a stream of 12-bit units. It was invented recently by
    > me.
    >
    > Full description is here:
    >
    > http://tapemark.narod.ru/comp/utf12en.html
    >
    > UTF-12 may be of little use in practice, but it is very nice from the
    > theoretical point of view.



    This archive was generated by hypermail 2.1.5 : Mon Jun 21 2010 - 14:18:59 CDT