From: Lars Kristan (lars.kristan@hermes.si)
Date: Thu Dec 16 2004 - 10:30:57 CST
Arcane Jill wrote:
> >> # for all possible octet sequences s:
> >> # length of (UTF-8(f(s)) <= length of s,
>
> >No, that is not the requirement. It is:
> >bytelength(f(s)) <= 2*bytelength(s)
>
> You haven't understood. By definition, s is an octet stream,
> and f(s) is a
> Unicode character stream - and therefore "bytelength(f(s))"
> is completely
> meaningless.
Sorry. My fault. How about:
bytelength(UTF-16(f(s))) <= 2*bytelength(s)
and
bytelength(UTF-32(f(s))) <= 4*bytelength(s)
?
And it is:
bytelength(UTF-8(f(s))) <= 3*bytelength(s)
right?
Which is not very good, but mostly I can get away without that conversion. I
simply keep s as-is. Which is, BTW, what Unicoders fear most. But often do
themselves.
Lars
This archive was generated by hypermail 2.1.5 : Thu Dec 16 2004 - 10:36:54 CST