Re: Standaridized variation sequences for the Desert alphabet? from Martin J. Dürst on 2017-03-28 (Unicode Mail List Archive)

From: Martin J. Dürst <duerst_at_it.aoyama.ac.jp>
Date: Tue, 28 Mar 2017 19:39:13 +0900

Hello Michael, others,

On 2017/03/27 21:07, Michael Everson wrote:
> On 27 Mar 2017, at 06:42, Martin J. Dürst <duerst_at_it.aoyama.ac.jp> wrote:
>
>>> The characters in question have different and undisputed origins, undisputed.
>>
>> If you change that to the somewhat more neutral "the shapes in question have different and undisputed origins", then I'm with you. I actually have said as much (in different words) in an earlier post.
>
> And what would the value of this be? Why should I (who have been doing this for two decades) not be able to use the word “character” when I believe it correct? Sometimes you people who have been here for a long time behave as though we had no precedent, as though every time a character were proposed for encoding it’s as thought nothing had ever been encoded before.

I didn't say that you have to change words. I just said that I could
agree to a slightly differently worded phrase.

And as for precedent, the fact that we have encoded a lot of characters
in Unicode doesn't mean that we can encode more characters without
checking each and every single case very carefully, as we are doing in
this discussion.

> The sharp s analogy wasn’t useful because whether ſs or ſz users can’t tell either and don’t care.

Sorry, but that was exactly the point of this analogy. As to "can't
tell", it's easy to ask somebody to look at an actual ß letter and say
whether the right part looks more like an s or like a z. On the other
hand, users of Deseret may or may not ignore the difference between the
1855 and 1859 shapes when they read. Of course they will easily see
different shapes, but what's important isn't the shapes, it's what they
associate it with. If for them, it's just two shapes for one and the
same 40th letter of the Deseret alphabet, then that is a strong
suggestion for not encoding separately, even if the shapes look really
different.

> No Fraktur fonts, for instance, offer a shape for U+00DF that looks like an ſs. And what Antiiqua fonts do, well, you get this:
>
> https://en.wikipedia.org/wiki/%C3%9F#/media/File:Sz_modern.svg

Yes. And we are just starting to collect evidence for Deseret fonts.

> And there’s nothing unrecognizable about the ſɜ (< ſꝫ (= ſz)) ligature there.

Well, not to somebody used to it. But non-German users quite often use a
Greek β where they should use a ß, so it's no surprise people don't
distinguish the ſs and ſz derived glyphs.

> The situation in Deseret is different.

The graphic difference is definitely bigger, so to an outsider, it's
definitely quite impossible to identify the pairs of shapes. But that
does in no way mean that these have to be seen as different characters
(rather than just different glyphs) by insiders (actual users).

To use another analogy, many people these days (me included) would have
difficulties identifying Fraktur letters, in particular if they show up
just as individual letters. Similar for many fantasy fonts, and for
people not very familiar with the Latin script.

> Underlying ligature difference is indicative of character identity. Particularly when two resulting ligatures are SO different from one another as to be unrecognizable. And that is the case with EW on the left and OI on the right here:
> https://en.wikipedia.org/wiki/Deseret_alphabet#/media/File:Deseret_glyphs_ew_and_oi_transformation_from_1855_to_1859.svg
>
> The lower two letterforms are in no way “glyph variants” of the upper two letterforms. Apart from the stroke of the SHORT I 𐐆 they share nothing in common — because they come from different sources and are therefore different characters.

The range of what can be a glyph variant is quite wide across scripts
and font styles. Just that the shapes differ widely, or that the origin
is different, doesn't make this conclusive.

> Character origin is intimately related to character identity.

In most cases, yes. But it's not a given conclusion.

> I don’t think that ANY user of Deseret is all that “average”. Certainly some users of Deseret are experts interested in the script origin, dating, variation, and so on — just as we have medievalists who do the same kind of work. I’m about to publish a volume full of characters from Latin Extended-D. My work would have been impossible had we not encoded those characters.

No, your work wouldn't be impossible. It might be quite a bit more
difficult, but not impossible. I have written papers about Han
ideographs and Japanese text processing where I had to create my own
fonts (8-bit, with mostly random assignments of characters because these
were one-off jobs), or fake things with inline bitmap images (trying to
get information on the final printer resolution and how many black
pixels wide a stem or crossbar would have to be to avoid dropouts, and
not being very successful).

I have heard the argument that some character variant is needed because
of research, history,... quite a few times. If a character has indeed
been historically used in a contrasting way, this is definitely a good
argument for encoding. But if a character just looked somewhat different
a few (hundreds of) years ago, that doesn't make such a good argument.
Otherwise, somebody may want to propose new codepoints for Bodoni and
Helvetica,...

Regards, Martin.
Received on Tue Mar 28 2017 - 05:39:32 CDT

This archive was generated by hypermail 2.2.0 : Tue Mar 28 2017 - 05:39:32 CDT