Re: unicode format

From: John Cowan (cowan@ccil.org)
Date: Mon Feb 23 2004 - 07:50:34 EST

  • Next message: Jon Hanna: "Re: websites"

    steve scripsit:

    > Could someone please clarify the difference between UTF8 and UFT16
    > please? If it is possible to encode everything in UTF8 and it is more
    > efficient what is the need for UTF16?

    The short version is that in UTF-8, characters can occupy 1, 2, 3, or
    (very rarely) 4 bytes; in UTF-16, characters can occupy 2 or (very
    rarely) 4 bytes. Either encoding can be used with any textual content.

    UTF-8 is typically more compact than UTF-16 for English and other
    Latin-alphabet languages, slightly more compact for Greek, Cyrillic,
    Armenian, Hebrew, and Arabic alphabets, and almost 50% less compact
    for everything else.

    -- 
    John Cowan  jcowan@reutershealth.com  http://www.ccil.org/~cowan
    O beautiful for patriot's dream that sees beyond the years
    Thine alabaster cities gleam undimmed by human tears!
    America! America!  God mend thine every flaw,
    Confirm thy soul in self-control, thy liberty in law!
            -- one of the verses not usually taught in U.S. schools
    


    This archive was generated by hypermail 2.1.5 : Mon Feb 23 2004 - 08:27:25 EST