Re: 32'nd bit & UTF-8

From: Mark E. Shoulson (
Date: Tue Jan 18 2005 - 17:46:06 CST

  • Next message: Hans Aberg: "Re: 32'nd bit & UTF-8"

    Philippe VERDY wrote:

    >>If one should philosophize on the question of general multi-byte encodings
    >>(or rather "transformation formats"), then UTF-BSS uses a leading byte the
    >>number of bytes displayed in a unary number format, numbers of base 1. In
    >>fact, in a computer, it is more efficient to use binary numbers :-), so I
    >>would probably put a binary number there for instead. One could still use
    >>the unary number idea in order to indicate the length of the binary numbers.
    >If I want to philosophe, the only UNARY number that exists is ZERO.
    >Unary number(s!) is not making an arithmetic.
    >I suppose you meant BINARY throughout... because numbers of base 1 DON'T EXIST!
    >(just ask yourself what is the definition of a base for numbers, and think about powers of this base to scale each digit: 1^n equals 1 for scaling every digit position n, so all digits scale by the same factor. To be a unique representation of numbers in that system, the only satisfying integer is zero...)
    I've often heard "unary" used to refer to representing numbers in the
    most elementary way: 1 for one, 11 for two, 111 for three, 1111 for
    four, and so on. It truly *is* unary in the sense that there is only
    one digit (for zero you use the empty string). I remember seeing it as
    an exception in complexity theory, since the complexity of an algorithm
    presumes that numbers are represented in some base *greater than one*,
    i.e. the length of the number is proportional to the logarithm of the
    number, whereas in unary the length of the number is proportional to the
    number itself.

    Further discussion on this is off-topic (like everything else); suffice
    to say that Hans or whoever did not invent the term nor this use of it,
    and it makes sense.


    This archive was generated by hypermail 2.1.5 : Tue Jan 18 2005 - 17:50:05 CST