Re: EUC-UTF8 is possible!

From: Doug Ewell (dewell@adelphia.net)
Date: Sat Mar 17 2007 - 15:15:22 CST

  • Next message: Alexej Kryukov: "Re: Vista Fonts"

    Dan Kogai <dankogai at dan dot co dot jp> wrote:

    > I am really surprised to find that EUC and UTF-8 can be mashed up
    > easily.
    >
    > The secret is \xFF. This byte NEVER appears in EUC or UTF-8. So you
    > can define the combo character as follow;
    >
    > EUC_UTF8_CHAR = EUC_CHAR | \xFF + UTF8_CHAR

    No no no no. Please don't do this. Nobody else will implement it and
    you will be effectively limited to using it internally within your own
    programs.

    Just use UTF-8, or if saving bytes is that important to you, use SCSU or
    a general-purpose compression technique. See UTN #14 for more on
    Unicode text compression.

    As someone who has created a number of alternative encoding schemes, I
    assure you that a scheme that "looks like" EUC or "looks like" UTF-8
    will cause you much more trouble than a completely new scheme that can't
    be confused for anything else.

    --
    Doug Ewell  *  Fullerton, California, USA  *  RFC 4645  *  UTN #14
    http://users.adelphia.net/~dewell/
    http://www1.ietf.org/html.charters/ltru-charter.html
    http://www.alvestrand.no/mailman/listinfo/ietf-languages
    


    This archive was generated by hypermail 2.1.5 : Sat Mar 17 2007 - 15:17:48 CST