RE: Roundtripping in Unicode

From: Lars Kristan (lars.kristan@hermes.si)
Date: Tue Dec 14 2004 - 05:19:22 CST

  • Next message: Lars Kristan: "RE: Roundtripping in Unicode"

    Kenneth Whistler wrote:
    > Lars Kristan stated:
    >
    > > I said, the choice is yours. My proposal does not prevent
    > you from doing it
    > > your way. You don't need to change anything and it will
    > still work the way
    > > it worked before. OK? I just want 128 codepoints so I can
    > make my own
    > > choice.
    >
    > You have them: U+EE80..U+EEFF, which are yours to use (or abuse)
    > in an application as you see fit. Just don't expect others outside
    > your application to interpret them as you do.

    Well, I DO want someone to interpret them the way I do. And display them.
    And let them be entered. And not risk a clash with someone else, we are
    talking about PUA, right?

    >
    > > And once and for all, you can treat those 128 codepoints just as you
    > > do today.
    >
    > A number of people on the list have patiently explained why what
    > you are proposing to do fundamentally breaks UTF-8 and its
    > relationship to other Unicode encoding forms.

    It does not. I may have suggested at some point that the conversion from
    codepoins to UTF-8 should be changed. But I am no longer proposing that. The
    conversion to and from UTF-8 remains EXACTLY as it is today. I will use my
    own conversion as I see fit and deal with all the consequences. But I need
    128 VALID codepoints. Not in PUA, not in any plane, but in BMP. And just
    because I say 'I' need, does not mean I am the only one.

    One would judge who is right and who is not by the number of responses. But
    that is definitely not so. A couple of people keep responding and they have
    more or less the same theme. Which is because it has been rehearsed time and
    time again. I believe there are people who have long since realized that my
    claims are correct. But are just afraid to speak up. Also, wherever I win an
    argument, it is just dropped. In the end all that remains is a 'feeling' by
    a few people that 'this is not good'.

    >
    > The chances that you will get the standard extended to incorporate
    > these 128 code points and define their mapping to invalid byte
    > values in UTF-8 is somewhere between zilch, nada, and nil.

    No, not UTF-8. UTF-8 remains as it is. What I will do with them is my
    business. I am only telling you about it so you cannot dismiss it as
    'encapsulating arbitrary binary data in Unicode'.

    Lars



    This archive was generated by hypermail 2.1.5 : Tue Dec 14 2004 - 05:22:40 CST