Re: UAX #9: applicability of higher-level protocols to bidi plaintext

From: Shai Berger via Unicode <unicode_at_unicode.org>
Date: Fri, 20 Jul 2018 12:45:17 +0300

Hi Ken (and all),

Thanks for your time and patience with this.

On Thu, 19 Jul 2018 18:10:49 -0700
Ken Whistler via Unicode <unicode_at_unicode.org> wrote:

> On 7/19/2018 12:38 AM, Shai Berger via Unicode wrote:
> > If I cannot trust that
> > people I communicate with make the same choices I make, plain text
> > cannot be used.
>
> Here is a counterexample [a table rendered in plain text, which is
> only truly legible using a fixed-width font].
>
> It isn't that "plain text cannot be used" to convey this content. The
> content is certainly "legible" in the minimal sense required by the
> Unicode Standard, and it is interchangeable without data corruption.
> The problem is that for optimal display and interpretation as
> intended, I also need to convey (and/or have the reader guess) the
> higher-level protocol requirement that this particular plain text
> needs to be displayed with a monowidth font.
>

If I understand correctly, you are rejecting my claim that
directionality is an issue of content, and claiming that, just like
the crumbling-down of your table, it is an issue of display. But that
argument is clearly disproved by the mere presence of the
directionality-setting characters (RLM, LRE, etc) in the Unicode
character set; in other words, your example would be convincing if
Unicode included characters like "start table row" and "close table
cell", AND there was an annex saying that your lines (for whatever
reason) are to be treated as table rows unless a higher-level-protocol
said otherwise. I believe this is not the case.

> > If the Unicode standard does not impose a
> > universal default, it does not define interchangeable plain text.
>
> And that is simply not the case. If your text is <a, b, c, !> (<L, L,
> L,
> ON>), that will display as {abc!} in a LTR paragraph directional
> ON>context and as {!abc} in a RTL paragraph directional context.

> [...] if plain text doesn't forcefully carry with it and
> require how it must be displayed, well, then it isn't really
> interchangeable.
>
> But that isn't what the Unicode Standard means by plain text. And
> isn't what it requires for interchangeability of plain text.

If I understood your argument correctly, it amounts to a claim that
Unicode defines plain text as a component in a data format, but not to
be used as a full document. If that is correct, then there is much to
fix -- I think that quite a lot of existing technology assumes the
opposite (e.g. the use of "Content-Type: text/plain; charset=UTF-8" in
MIME should be strongly discouraged, if the people who designed
Unicode and UTF-8 think it is not appropriate for full documents).

If I misunderstood, please correct me.

> >
> > My main point, whose rejection baffles me to no end, is that it
> > should.
>
> Well, I'm not expecting that I can make you feel good about the
> situation. ;-) But perhaps the UTC position will seem a little less
> baffling.

As I hope I've shown above, there's plenty of reason for bafflement.
The UTC defines code points to encode directionality, but then refuses
to treat directionality as content when it comes to paragraph
directionality; it defines a higher-level-protocol as an agreement, and
then turns around and says the word "agreement" actually means
"decision".

I can guess reasons for why the things are the way they are, but not
justifications. I stay baffled.

Thanks,
        Shai.
Received on Fri Jul 20 2018 - 04:45:52 CDT

This archive was generated by hypermail 2.2.0 : Fri Jul 20 2018 - 04:45:53 CDT