Re: New Character Property for Prepended Concatenation Marks from Philippe Verdy on 2015-11-26 (Unicode Mail List Archive)

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Thu, 26 Nov 2015 11:41:47 +0100

The root sign is much more complex than just prepending specific sequences
of characters (in a limited set): when it embeds some "text", it can it it
recursively and unless you use additional parentheses for the linear
presentation, it highly depends on the 2D layout of its operand
(additionally it could be prefixed itself by a superscripted radix value).
Leave it alone: the 2D layout (even in the linear presentation using
parentheses where needed) will be mapped using an additional mathematical
presentaiton layer and notation.
For the basic plain-text, the root sign will just stay alone without using
any complex layout, and its operand will simply follow it (using
parentheses where needed) without specific rendering.

----
However the proposal for these prepended concatenation marks does not give
any hint about how to compute the extent of the following clusters
above/over/below/around which they will apply (do they extend over only
letters/digits, but not whitespaces or punctuation signs including
abbreviation marks?
For me this kind of visual interaction should be more explicitly delimited
using special marks (working like invisible parentheses) : the absence of
these special marks immediately after the prepended concatenation mark
should mean that they will not extend after the next (non-whitespace)
cluster.
So:
- <ARABIC NUMBER SIGN, SPACE, ARABIC DIGIT ONE> will display the isolated
number sign WITHOUT extending to the following space and digit
- <ARABIC NUMBER SIGN, ARABIC DIGIT ONE, ARABIC DIGIT TWO> will apply the
number sign ONLY to the first digit
- <ARABIC NUMBER SIGN, START OF SEQUENCE, ARABIC DIGIT ONE, ARABIC DIGIT
TWO, END OF SEQUENCE> will apply the number sign to the two digits
- <ARABIC NUMBER SIGN, START OF SEQUENCE, ARABIC DIGIT ONE, FULL STOP, ARABIC
DIGIT TWO, END OF SEQUENCE> will apply the number sign to the two digits
and the separating full stop
- <ARABIC NUMBER SIGN, START OF SEQUENCE, ARABIC DIGIT ONE, SPACE, ARABIC
DIGIT TWO, END OF SEQUENCE> will apply the number sign to the two digits
and the separating space
- <ARABIC NUMBER SIGN, START OF SEQUENCE, ARABIC DIGIT ONE, NEWLINE, ARABIC
DIGIT TWO, END OF SEQUENCE> will apply the number sign to the first digit
only before the newline control, the second digit will appear on the next
line outside the number sign complex cluster, the second control will be
ignored (or would display with a "visible control glyph".
Without the <START OF SEQUENCE> and <END OF SEQUENCE> special controls, it
will be necessary anyway to define specific enumerations of characters that
can be part of the sequence on which the prepended mark will apply.
Another complication: when such prepended sequences are recognized, there
are specific tunings to apply in line-breaking algorithms.
Word breaking algorithms may not need specific changes if the enumerations
of characters that can be part of the prepended sequence cannot contain any
word-breaking character. That's why I suggested that, by default, such
enumerations should include only letters and digits but not whitespace (and
probably not punctuation signs such as the dot), plus their additional
combining marks.
- For Arabic U+0600, U+0601 and U+0605 (TUS-9.2, page 374), the enumeration
is supposed to contain only Arabic-Indic or extended Arabic-Indic digits,
but I wonder if it should not include as well number separators, or even
Arabic-European digits.
- Same remark for the Kaithi number sign U+110BD.
- For Syriac U+070F (TUS-9.3, pages 390-391), the enumeration is not so
obvious (all Syriac "letter-numbers"?)
There are also similar characters in other scripts not listed: one example
with the Cyrillic hundred-thousands/millions marks U+0488..U+0489 which
enclose possibly more than one digits (currently encoded as combining marks
applicable to only one digit?); another with the Egyptian Hieroglyph
honorific "Cartouche" which encloses the name of a king; other examples
possible as well in other Asian scripts for honorific marks.
The system using explicitly delimited sequences would work as well with the
Latin script for some honorific "decorators" which are not just ligatures,
e.g. for the name of God or Jesus-Christ (which may also be themselves
abbreviated), including for Quranic transcriptions.
-- Philippe.
2015-11-26 9:10 GMT+01:00 "Jörg Knappen" <jknappen_at_web.de>:
> I wonder how this concept relates to mathematical notation, especially the
> root sign.
>
> --Jörg Knappen
>
> *Gesendet:* Mittwoch, 25. November 2015 um 23:34 Uhr
> *Von:* announcements_at_unicode.org
> *An:* announcements_at_unicode.org
> *Betreff:* New Character Property for Prepended Concatenation Marks
>
> The Unicode Technical Committee is seeking feedback on a proposal to
> define a new character property for the class of *prepended concatenation
> marks*, also referred to as *prefixed format control characters* or, more
> generically, as subtending marks. Characters in that class include U+0600
> ARABIC NUMBER SIGN and U+06DD ARABIC END OF AYAH. The new property, named
> Prepended_Concatenation_Mark and targeted for Unicode 9.0, would provide a
> mechanism to handle subtending marks collectively via properties rather
> than by hardcoded enumeration. A detailed description of the issue and how
> to provide feedback are given in Public Review Issue #310
> <http://www.unicode.org/review/pri310/>.
>
> http://blog.unicode.org/2015/11/new-character-property-for-prepended.html
>
>

Received on Thu Nov 26 2015 - 04:43:19 CST

This archive was generated by hypermail 2.2.0 : Thu Nov 26 2015 - 04:43:19 CST