From: John H. Jenkins (firstname.lastname@example.org)
Date: Mon Aug 09 2010 - 11:54:59 CDT
On Aug 7, 2010, at 10:40 AM, Doug Ewell wrote:
> I'd like to see an FAQ page on "What is Plain Text?" written primarily by UTC officers. That might go a long way toward resolving the differences between William's interpretation of what plain text is, which people like me think is too broad, and mine, which some people have said is too narrow.
Well, we do have <http://www.unicode.org/faq/ligature_digraph.html#10> and related FAQs?
The basic idea is that "plain text" is the minimum amount of information to process the given language in a "normal" way. FOR EXAMPLE, ALTHOUGH ENGLISH CAN BE WRITTEN IN ALL-CAPS, IT USUALLY ISN'T, AND DOING IT LOOKS WRONG. We therefore have both upper- and lower-case letters for English. On the other hand, although English *is* usually written with some facility to provide emphasis, different media have different ways of providing that facility (asterisks, underlining, italicizing), and English written without any of these looks perfectly fine.
Arabic, on the other hand, absolutely must have some way of allowing for different letter shapes in different contexts, or it looks just wrong, so Arabic "plain text" must have facility to allow for that, either by explicitly having different characters for the different shapes the letters take, or by providing a default layout algorithm that defines them.
Beyond rendering, there are also considerations as to the minimal amount of information necessary for other text-based processes, such as sorting, searching, and text-to-speech.
Yes, there are issues which end up being judgment calls, and it's easy to come up with cases where you can't really capture the full semantic intent of the author without what Unicode calls "rich text." My favorite example is "The Mouse's Tale" in _Alice in Wonderland_. Plain text isn't intended to capture all the nuances of the original's semantics, but to provide at the least a very close approximation.
Variation selectors are intended to cover cases where more information is needed for rendering than is required for other processes such as searching (Mongolian), or cases where different user communities disagree on whether two forms must be unified or must be deunified.
Hoani H. Tinikini
John H. Jenkins
This archive was generated by hypermail 2.1.5 : Mon Aug 09 2010 - 11:58:53 CDT