Re: Getting A Newb Started

From: William J Poser (wjposer@ldc.upenn.edu)
Date: Mon Jul 07 2008 - 16:19:15 CDT

  • Next message: Ngwe Tun: "Re: wikipedia unicode font."

    There's no way to avoid using more than one byte per character if
    you're using Unicode since there are more than 256 characters. If
    you use UTF-32, every char is four bytes. If you use UTF-8, characters
    take from one to four bytes depending on where the corresponding codepoint
    is. If you use UTF-16, every character in the BMP is two bytes, any character
    outside of the BMP takes four bytes.

    The downside of UTF-16 and UTF-8 is that characters are not the same
    length, which makes processing more complicated. With UTF-16, however,
    if you know that there are no characters outside the BMP, every
    character is a constant two bytes wide.

    Bill



    This archive was generated by hypermail 2.1.5 : Mon Jul 07 2008 - 16:21:23 CDT