Re: Private Use areas from Asmus Freytag via Unicode on 2018-08-28 (Unicode Mail List Archive)

From: Asmus Freytag via Unicode <unicode_at_unicode.org>
Date: Tue, 28 Aug 2018 03:27:28 -0700

On 8/27/2018 2:20 PM, Rebecca Bettencourt via Unicode wrote:

> That sounds like a non-conformant use of characters in the U+24xx block.

Well, you are an expert on these things and I do not understand as to with what it would be non-conformant.

A conformant process must interpret ⓅⓊⒶⒹⒶⓉⒶ as the characters ⓅⓊⒶⒹⒶⓉⒶ and not as a signal to process what follows as anything other than plain text.

Not correct.

If that was literally true, then all HTML, XML, CSS, C, C#, Java, Python source code files and their compilers would be non-conformant.

It's more like, "if a process treats a sequence of bytes as Unicode plain text, then the bytes corresponding to the codes assigned to ⓅⓊⒶⒹⒶⓉⒶ just stand for ⓅⓊⒶⒹⒶⓉⒶ. Any meaning is imparted by the (human) reader."

However, if the process treats the file as a source file in a markup language, there's nothing that prevents it from assigning particular interpretations to ⓅⓊⒶⒹⒶⓉⒶ, including, but not limited to not displaying these code points as characters.

The interpretation of the remainder of the file may well be conformant to the Unicode Standard, just as the display of the contents of many HMTL elements is usually conformant to the Unicode Standard.

What you are proposing is a higher-level protocol, whether you realize it or not.

Correct, the rub here is that all these schemes that treat characters as both syntax and text depending on context amount to mark-up languages and are therefore ipso-facto no longer plain text (except if displayed as source code, but already applying syntax coloring would no longer be purely treating the data as plain text).

In-band markup has thus a dual nature as plain text and rich text, depending on how it is processed.

Unfortunately your higher-level protocol has a serious flaw in that it cannot represent the string "ⓅⓊⒶⒹⒶⓉⒶ".

That could probably be remedied by the usual techniques.

Also, seeing a bunch of circled alphanumeric characters in a document ⓘⓢ◯ⓕⓐⓡ◯ⓕⓡⓞⓜ◯ⓤⓝⓞⓑⓣⓡⓤⓢⓘⓥⓔ.

There are plenty of already-existing higher-level protocols (you mentioned one: XML) that could be used to provide information about PUA characters, and they are all much better suited to that purpose than what you are proposing.

There are situations where an ad-hoc markup language seems to fulfill a need that is not well served by the existing full-fledged markup languages. You find them in internet "bulletin boards" or services like GitHub, where pure plain text is too restrictive but the required text styles purposefully limited - which makes the syntactic overhead of a full-featured mark-up language burdensome.

Too bad that there's been no "winner" among these, and therefore no universally accepted one. If so, it might have presented an obvious target for a PUA extension.

A./

Received on Tue Aug 28 2018 - 05:27:47 CDT

This archive was generated by hypermail 2.2.0 : Tue Aug 28 2018 - 05:27:47 CDT