RE: Roundtripping Solved

From: Lars Kristan (lars.kristan@hermes.si)
Date: Thu Dec 16 2004 - 10:30:57 CST

  • Next message: Lars Kristan: "RE: Roundtripping Solved"

    Arcane Jill wrote:
    > >> # for all possible octet sequences s:
    > >> # length of (UTF-8(f(s)) <= length of s,
    >
    > >No, that is not the requirement. It is:
    > >bytelength(f(s)) <= 2*bytelength(s)
    >
    > You haven't understood. By definition, s is an octet stream,
    > and f(s) is a
    > Unicode character stream - and therefore "bytelength(f(s))"
    > is completely
    > meaningless.

    Sorry. My fault. How about:
    bytelength(UTF-16(f(s))) <= 2*bytelength(s)
    and
    bytelength(UTF-32(f(s))) <= 4*bytelength(s)
    ?

    And it is:
    bytelength(UTF-8(f(s))) <= 3*bytelength(s)
    right?

    Which is not very good, but mostly I can get away without that conversion. I
    simply keep s as-is. Which is, BTW, what Unicoders fear most. But often do
    themselves.

    Lars



    This archive was generated by hypermail 2.1.5 : Thu Dec 16 2004 - 10:36:54 CST