RE: Character proposal: SUBSCRIPT TEN

From: Philippe Verdy (
Date: Fri Jan 18 2008 - 15:46:14 CST

  • Next message: Kenneth Whistler: "Re: Proposal to encode three combining diacritical marks for Low German dialect writing"

    Mark Davis wrote:
    > We only encode new characters when there is no way to represent
    > the characters otherwise in Unicode. In some cases, a single
    > character in the source set maps to a sequence in Unicode.
    > For this particular case, it is unclear to me why the sequence.
    > U+2081 ( ₁ ) SUBSCRIPT ONE
    > U+2080 ( ₀ ) SUBSCRIPT ZERO
    > is not sufficient.

    I do agree. But authors must also agree that the existence of bugs/limitations in fonts currently used to render these characters are not enough to justify a fork.

    For example, the subscript one digit appears on my system with a supplementary large space on the right (this does not occur with subscript zero), because the glyph is found only within an ideographic font where the subscript is drawn within a ideographic square.

    This is a limitation of my installed fonts. If I had better fonts, the subscript one would be drawn like subscript zero (and there's no reason why the existing font that currently supports subscript zero correctly will not support subscript one in some later version, given that there's no reason these characters would have to be bound to ideographic squares).

    The same can be said about the vertical tick diacritic in Yoruba: some Yoruba fonts will draw it detached, but its is also correct to display it attached. So this is a particular design choice in a specific font made for Yoruba that displays the diacritic detached, when some German users would want it attached. The way it is currently encoded, both representations are possible, and the same encoded character currently fits with the two usages.

    To demonstrate the need to encode a separate character, you need to demonstrate the existence of a semantic distinction between the attached and detached versions.

    Effectively, the combining cute and grace accents may be rendered also attached, for the same words in the same languages and the same orthography, even though they are preferably detached (this is only a typographic difference). This does not change the semantic of the accented character, and there's no reason not to encode both the same way.

    This archive was generated by hypermail 2.1.5 : Fri Jan 18 2008 - 19:12:53 CST