Re: Proposal to add standardized variation sequences for chess notation from Michael Everson on 2017-04-03 (Unicode Mail List Archive)

From: Michael Everson <everson_at_evertype.com>
Date: Mon, 3 Apr 2017 14:12:52 +0200

On 2 Apr 2017, at 18:27, Richard Wordingham <richard.wordingham_at_ntlworld.com> wrote:

> We seem to agree that it should be a graphic modification, rather than as semantic modification.

Yes, we do.

> The question I pose is, "Is it just a graphic modification in this case?".

Yes, it is.

> I'm not convinced that it is. A player starts with two non-interchangeable bishops. <U+2657, U+FE01> could only refer the white bishop that is restricted to black squares. That's a semantic difference.

Surely not. If it were, we would encode WHITE BISHOP THAT STAYS ON THE WHITE SQUARES and WHITE BISHOP THAT STAYS ON BLACK SQUARES and we would encode WHITE KNIGHT THAT MOVES FROM WHITE SQUARES TO BLACK SQUARES and WHITE KNIGHT THAT MOVES FROM BLACK SQUARES TO WHITE SQUARES.

> The immediate parallel that comes to mind is the ideographic square. A sequence of CJK ideographs should be a monospace sequence - and that is the major point of most of the ASCII clones with 'IDEOGRAPHIC’ or 'FULLWIDTH' in their names. The uniform width is a key part of the semantic of the seqeunces being discussed.

I think you are seriously going the wrong way with this thinking. The immediate parallel that comes to mind are things like:

1000 MYANMAR LETTER KA
⁓ 1000 FE00 dotted form

where the character can still be read if the variation selector’s glyph can’t be shown. Uniform width is a feature of CJK, sure, but that’s the nature of the writing system. Chess pieces for setting withing in ordinary text do NOT have to be an em wide, and they don’t in fonts. Chess pieces on a white square or on a black square do have to have a uniform width in order to produced the board matrix.

> U+00A0 makes a lot of sense as the base character.

What? NBSP and SP are whitespace characters, with complex behaviours, and chessboards, whether set in lead type or digitally, are sets of simple symbol glyphs. NBSP glues two things together. SP separates things. Chessboards are not collections of black squares glued together by white spaces with white spaces at the alternating ends of lines. I reject this analysis.

> Also having variants of U+25A1 and U+25A8 that match the game square filter modifiers seems quite legitimate.

Um, wait… What are you proposing NBSP for? I'm confused now. If you like these two characters (and I am glad you do) there’s no need for U+00A0 at all.

> Possible lack of OpenType support is supposed not to be an admissible justification.

Well, I addressed this in the proposal. OpenType support for the symbol + VS sequences gives the desired result. A board prepared using this encoding proposal is legible even if not beautiful, but is nevertheless parseable, and in my view is a robust and convenient higher-level protocol which is certainly superior to the chaos that currently besets the chess community, who can’t even reliably interchange chessboard data using their ASCII fonts due to the plethora of encodings still in use. (None of the chess fonts I have examined use the Unicode chess characters at all.)

>> Your suggestion is not going to alter the burden on the font with regard to display.
>
> My suggestion actually increases it. I suggested it because it seems to be the proper thing to do.

I can’t agree.

> Variation sequences seem to be the easier solution - provided they are supported in the first place.

It is understood that not all environments may display such ligatures, but that’s true for every character that uses a variation sequence.

>> 2654 FE00; Unqualified chesspiece; # WHITE CHESS KING
>> 2654 FE01; Chesspiece on white; # WHITE CHESS KING
>> 2654 FE02; Chesspiece on black; # WHITE CHESS KING
>>
>> (that is:
>>
>> sub uni2654 uniFE00 by uni2654 ;
>> sub uni2654 uniFE01 by uni2654FE02 ;
>> sub uni2654 uniFE02 by uni2654FE01 ;)
>>
>> But I didn’t see any need for that, since 2654 is already the
>> unqualified chesspiece. If there’s a formal need for triplets rather
>> than couplets here, I’ll conform to it, but that seems to be
>> incidental to the robustness of the proposal.
>
> It's an incidental detail, but if needed someone will have to attend to it. U+2654 is simply the chesspiece; a font that only had variants for white and 'black' backgrounds could nominate either as the glyph for U+2654 on its own.

No, again, it’s not right to say that chess pieces on their own have to be the width of an em square, and this would disrupt their use in ordinary text. Here are the metrics for the pieces in Ludus:

>> If a font doesn’t support a glyph or a sequence, then operating systems substitute other glyphs or the .notdef glyph or whatever, no?
>
> No.
>
> First of all, the substitution mechanism is usually above the operating system layer, with varying degrees of application control.

Well, yes, OpenType is handled by the font and by the app knowing that the OpenType tables are there.

> Secondly, the mechanism can only look for a substitute if it knows that the glyph is missing.

The macOS does this quite reliably. If Baskerville has no chess piece, but Ludus does, then a text in Baskerville wlll usually display the Ludus glyph. You can override this by selecting the Ludus gyph and forcing it back to Baskerville and then you get a box or other substitution glyph.

> If it's looking for an OpenType font for a glyph of the family <U+82A6, U+E0100>,

Or any OpenType substitution string.

> the obvious mechanism is to consult the cmap format 14 subtable. The font gives no indication of what glyph families the font's default rendering of U+82A6 is supposed to belong to.

I don’t really find us in disagreement….

Michael Everson
Received on Mon Apr 03 2017 - 07:16:08 CDT

This archive was generated by hypermail 2.2.0 : Mon Apr 03 2017 - 07:16:08 CDT