RE: On the possibility of encoding a portable interpretable object code into Unicode

From: Philippe Verdy (
Date: Tue Mar 24 2009 - 22:59:23 CST

  • Next message: Jonathan Rosenne: "RE: writing direction"

    As long as you use PUA code points to do this, there will be no problem at
    Then you can map the virtual byte code you want on it, but don't expect that
    it will magically run. You could use this to encapsulate in fact any binary

    But for such application, you don't even have to use supplementary planes:
    you could just as well remap each byte from your byte code or binary file
    into one of 256 PUAs in the BMP.

    The cost would be one additional byte for the enconging in UTF-16 for each
    source byte, i.e. exactly the same as an ASCII hexadecimal dump in a UTF-8
    file (but without the endianness issue; if you used UTF-8, the cost would be
    even worse than simple hexadecimal).

    So I'm not sure this represents any interest for the general community. Byte
    codes are binary objects, and Unicode is not made to represent arbitrary
    binary data, but actual text characters. Binary objects can be transcoded
    and transported within text protocols (like XML or in MIME for emails and
    newsgroups or in PostSCript files), using several existing and well known
    transport syntaxes (hexadecimal, uuencoding, Base64, Base85...); then all
    you need is a private convention or an upper-layer protocol (out of scope of
    Unicode) to allow you to delimit the spans of reencoded binary sequences for
    arbitrary objects (images, applications...): this may be a syntax or the use
    of specific private delimiters (only two delimiters allocated as PUAs with
    your private convention would do the trick).

    If you really want to create a virtual machine with your own bytecode, this
    should be done separately in your own binary format; there's no reason to
    also remap it within Unicode, and no reason why Unicode would favor your
    bytecode for your virtual machine and not several other competing bytecodes
    for virtual machines (P-code, JVM, .Net/CLR, Python/.pyc, Parrot, ...); and
    no reason why this bytecode would be viable for a very long term (even Java
    has changed its bytecode with several incompatible versions, but within JVMs
    that support upward compatibility;

    the same is true for the oldest P-Codes, for Python, it will probably be
    true for Parrot: bytecodes need to evolve according to the evolution of VMs,
    notably with their architecture, security, verification mecanisms,
    deployment options, ....

    William_J_G Overington wrote:
    > Envoyé : mardi 24 mars 2009 16:31
    > À :
    > Cc :
    > Objet : On the possibility of encoding a portable
    > interpretable object code into Unicode
    > I am hoping that one day a portable interpretable object code
    > will be added to regular Unicode.
    > If that happens, then it would be added into a high plane of
    > Unicode, perhaps plane 12.
    > Each code point used in the portable interpretable object
    > code would represent a command to a virtual machine that
    > would be obeyed by the application program, such as a
    > document reader, that processed the Unicode characters as
    > software. Thus dynamic illustrations and indeed interactive
    > illustrations could be added into a text document using a
    > non-proprietary format that is also in Unicode plain text
    > format. This could perhaps have far-reaching implications
    > for the future of information technology.
    > Is anyone interested in such a development of encoding
    > non-text items as Unicode characters please?
    > Adding a portable interpretable object code to regular
    > Unicode will not be an easy goal to achieve and could take
    > quite a time to achieve, yet it is a goal that I feel would
    > be worthwhile to achieve.

    This archive was generated by hypermail 2.1.5 : Wed Mar 25 2009 - 09:02:43 CST