From: Philippe Verdy (firstname.lastname@example.org)
Date: Tue Mar 24 2009 - 22:59:23 CST
As long as you use PUA code points to do this, there will be no problem at
Then you can map the virtual byte code you want on it, but don't expect that
it will magically run. You could use this to encapsulate in fact any binary
But for such application, you don't even have to use supplementary planes:
you could just as well remap each byte from your byte code or binary file
into one of 256 PUAs in the BMP.
The cost would be one additional byte for the enconging in UTF-16 for each
source byte, i.e. exactly the same as an ASCII hexadecimal dump in a UTF-8
file (but without the endianness issue; if you used UTF-8, the cost would be
even worse than simple hexadecimal).
So I'm not sure this represents any interest for the general community. Byte
codes are binary objects, and Unicode is not made to represent arbitrary
binary data, but actual text characters. Binary objects can be transcoded
and transported within text protocols (like XML or in MIME for emails and
newsgroups or in PostSCript files), using several existing and well known
transport syntaxes (hexadecimal, uuencoding, Base64, Base85...); then all
you need is a private convention or an upper-layer protocol (out of scope of
Unicode) to allow you to delimit the spans of reencoded binary sequences for
arbitrary objects (images, applications...): this may be a syntax or the use
of specific private delimiters (only two delimiters allocated as PUAs with
your private convention would do the trick).
If you really want to create a virtual machine with your own bytecode, this
should be done separately in your own binary format; there's no reason to
also remap it within Unicode, and no reason why Unicode would favor your
bytecode for your virtual machine and not several other competing bytecodes
for virtual machines (P-code, JVM, .Net/CLR, Python/.pyc, Parrot, ...); and
no reason why this bytecode would be viable for a very long term (even Java
has changed its bytecode with several incompatible versions, but within JVMs
that support upward compatibility;
the same is true for the oldest P-Codes, for Python, it will probably be
true for Parrot: bytecodes need to evolve according to the evolution of VMs,
notably with their architecture, security, verification mecanisms,
deployment options, ....
William_J_G Overington wrote:
> Envoyé : mardi 24 mars 2009 16:31
> À : email@example.com
> Cc : firstname.lastname@example.org
> Objet : On the possibility of encoding a portable
> interpretable object code into Unicode
> I am hoping that one day a portable interpretable object code
> will be added to regular Unicode.
> If that happens, then it would be added into a high plane of
> Unicode, perhaps plane 12.
> Each code point used in the portable interpretable object
> code would represent a command to a virtual machine that
> would be obeyed by the application program, such as a
> document reader, that processed the Unicode characters as
> software. Thus dynamic illustrations and indeed interactive
> illustrations could be added into a text document using a
> non-proprietary format that is also in Unicode plain text
> format. This could perhaps have far-reaching implications
> for the future of information technology.
> Is anyone interested in such a development of encoding
> non-text items as Unicode characters please?
> Adding a portable interpretable object code to regular
> Unicode will not be an easy goal to achieve and could take
> quite a time to achieve, yet it is a goal that I feel would
> be worthwhile to achieve.
This archive was generated by hypermail 2.1.5 : Wed Mar 25 2009 - 09:02:43 CST