RE: Encoding of invented items (from RE: Assigning a plane for mapping digits for many different bases)

From: Shawn Steele (
Date: Thu Mar 10 2011 - 11:33:50 CST

  • Next message: Luke-Jr: "Re: Encoding of invented items (from RE: Assigning a plane for mapping digits for many different bases)"

    > > Nothing prevents anyone from _applying_ to have a character encoded. But
    > > not every request will get accepted. If it doesn't have established usage
    > > in text processing and interchange, it won't get encoded.

    > In this day and age, it is impossible for any character to have ANY usage, let
    > alone established, until it is encoded at least somewhere.

    Private fonts, private use of the PUA, and conscript are all ways (besides working for a Japanese TelCo, which most of us don't do) that actual characters can get usage.

    Several of the more creative not-a-character suggestions each would use huge amounts of the unicode space left (like an entire plane!!!), so clearly, even if it were desirable, or even if they were characters, they couldn't all be implemented. Many of the suggestions should be a different protocol, as has been pointed out numerous times.

    To the more creative suggestions: HTML and JavaScript didn't require encoding in Unicode to get going, and, last I heard, they have already achieved a decent following. I can't think of anything more portable and interoperable that is "object code". Unless you want to get to C# or Java like opcodes, which are also portable, interoperable, and succeeded without being in Unicode.

    Several suggestions to solve the "localizable sentences" problem, which are effectively a resource format, were made. For example, an XML format like:
    <sentence ID="12345" language="en">Localizable sentences should be a higher level protocol</sentence>
    have been suggested, and would be far more effective at getting traction and solving the problem.
    In fact, Microsoft (and every other software manufacturer) effectively already uses "Localizable Sentences", which are packaged in higher level protocols known as "resource files". Look in %windir%\en-US and you'll find several examples. There are also several efforts to have shared resources, particularly for open source, bing for "translated phrases database" and you'll find a few. Interestingly these are higher level protocols, yet they still use Unicode to exchange CHARACTER information. Assigning names/ids of actual phrases/sentences is the higher level protocol.

    As far as I can tell, Mr. Overington hasn't even investigated the other options, any of which would be far more paletable to nearly anyone in the software industry. It's possible that they need not even devise their own system, but, perhaps, one of the existing repositories would be willing to accept his strings.

    Unicode isn't a hammer, and these aren't nails.


    This archive was generated by hypermail 2.1.5 : Thu Mar 10 2011 - 11:36:34 CST