Re: Encoding italic (was: A last missing link)

From: Victor Gaultney via Unicode <unicode_at_unicode.org>
Date: Wed, 16 Jan 2019 11:23:59 +0000

James Kass wrote:
> Concerns about statefulness in plain-text exist.  Treating "italic" as
> an opening/closing "punctuation" may help get around such concerns.
> IIRC, it was proposed that the Egyptian cartouche be handled that way.

I do appreciate the technical issues surrounding statefulness and user
expectation when they select, copy, and paste. However that has always
been an issue. The Latin script (and many others) already has 'states',
and that is reflected in the encoding of the markers that indicate the
beginning and end of those states (parens, quotes, etc.). In the Latin
script those markers are visually represented as separate glyphs,
although sometimes enterprising font makers will use OpenType or
Graphite to adjust those glyphs in context.

Encoding 'begin italic' and 'end italic' would introduce difficulties
when partial strings are moved, etc. But that's no different than with
current punctuation. If you select the second half of a string that
includes an end quote character you end up with a mismatched pair, with
the same problems of interpretation as selecting the second half of a
string including an 'end italic' character. Apps have to deal with it,
and do, as in code editors.

Apps (and font makers) can also choose how to deal with presenting
strings of text that are marked as italic. They can choose to present
visual symbols to indicate begin/end, such as /this/. Or they can
present it using the italic variant of the font, if available. Yes that
brings up the issue of what to do if no italic counterpart is there. But
that's already an issue with people using math characters for
pseudo-italic. I'd guess that far, far more fonts in the world have
italic counterparts than contain math chars, and the trend toward always
having roman/italic matched pairs is something I've established in my
research interviews.

Treating italic like punctuation is a win for a lot of people:

- Users get their italic content preserved in plain text

- Those who develop plain text apps (social media in particular) don't
have to build in a whole markup/markdown layer into their apps

- Misuse of math chars for pseudo-italic would likely disappear

- The text runs between markers remain intact, so they need no special
treatment in searching, selecting, etc.

- It finally, and conclusively, would end the decades of the mess in
HTML that surrounds <em> and <italic>.

My main point in suggesting that Unicode needs these characters is that
italic has been used to indicate specific meaning - this text is somehow
special - for over 400 years, and that content should be preserved in
plain text.
Received on Wed Jan 16 2019 - 05:41:45 CST

This archive was generated by hypermail 2.2.0 : Wed Jan 16 2019 - 05:41:46 CST