Re: Unicode & space in programming & l10n

From: Jefsey_Morfin (
Date: Wed Sep 27 2006 - 16:13:32 CST

  • Next message: Jukka K. Korpela: "Re: (Not really?) Unicode question"

    At 17:09 27/09/2006, wrote:
    >I don't claim to have an IQ of 0, but I have quite some difficulty
    >in understanding how this relates to any subject worthy of
    >discussion on this list.

    The point is the memory space/bandwidth being used by different
    codes. As Mark Davis documented it the typographic level permits some
    compression or better space management. However, this is limited.
    Conceptual codes and processing permit to drastically reduce that
    space in different manners. This is metacoms, one transmits the
    metainformation necessary to give back the entered information in the
    appropriate form to be intelligible to each reader. A simple example
    if you receive the Bible in Chinese to be sent to a French and a
    Spanish reader, you can use an OPES (open pluggable edge service)
    identifying the text at the entry, transmitting its equivalent URN in
    French and Spanish, and making them printed out of local library. If
    you run a diff, you can even transmit the changes to make while
    keeping the traffic low. You see that you have dramatically reduced
    the load/memory and avoided to duplicate an existing file.

    The problem you have is to initially identify the file as the Bible
    and in Chinese (this is very rigid example - you will go by quotes or
    concepts). So, you need a language recognition system which will be
    transparent to the different possible character encodings. You will
    have many other problems like font recognition, etc. But one of the
    interesting basic problem is to support a diff in languages using
    different upper case management systems. You cannot go by Unicode
    tables. You need the full range of 12 graphemes options)You need
    basic locale elements such as a grapheme sorting order and a way to
    describe their usage equivalence in different cultures/styles.

    Metacoms extended services are certainly new to most as an
    architectural layer in a network model and in language modes. But
    they are used all the time, without designers and developpers
    noticing this is a general fundamental communication process. This is
    typically what a CVS or a code is about. RFC 4646 says that if I want
    to say "this text is in American English", you just write "en-us"
    what represents a compression of 5/32.


    This archive was generated by hypermail 2.1.5 : Wed Sep 27 2006 - 16:15:13 CST