Re: Standaridized variation sequences for the Desert alphabet? from Michael Everson on 2017-04-06 (Unicode Mail List Archive)

From: Michael Everson <everson_at_evertype.com>
Date: Thu, 6 Apr 2017 15:07:20 +0100

On 6 Apr 2017, at 08:01, Martin J. Dürst <duerst_at_it.aoyama.ac.jp> wrote:

> Hello Michael,

Hi Martin.

>> It’s as though you’d not participated in this work for many years, really.
>
> Well, looking back, my time commitment to Unicode has definitely varied over the years. But that might be true for everybody.

I just get frustrated when everyone including the veterans seems to forget every bit of precedent that we have for the useful encoding of characters.

> What's more important is that Unicode covers such a wide range of areas, and not everybody has the same experience or knowledge. If we did, we wouldn't need to work together; it would be okay to just have one of us. Indeed, what's really very valuable and interesting in this work is the many very varied backgrounds and experiences everybody has.

I do not disagree, particularly.

>>> - That suggests that IF this script is in current use,
>>
>> You don’t even know? You’re kidding, right?
>
> Everything is relative. And without being part of the user community, it's difficult to make any guesses.

Hm, but you did make a guess.

>> Yeah, it doesn’t “seem” anything but a whole lot of special pleading to bolster your rigid view that the glyphs in question can be interchangeable because of the sounds they may represent.
>
> I don't remember every claiming that the glyphs must be used interchangeably, only that we should carefully examine whether they are or not, and that because they represent the same sound (in a phonetic alphabet, as it is)

We don’t encode sounds, we encode writing systems, the marks on paper, and in Latinate scripts (I’ll ignore CJK) we have never unified characters which are formed of historical ligatures like these… I guess ſs and ſʒ might possibly be the exception, but I think nobody would find a use for distinguishing them.

> and are shown in the same position in alphabet tables, we shouldn't a priori exclude such a possibility.

As it happens, at least one writer used the 𐐅-with-stroke (encoded for /ju;/) for /ɔɪ/, but I wouldn’t substitute the 𐐉-with-stroke (𐐦) for it in a diplomatic transcription. Normalized spelling is something else, but the orthography of Deseret manuscripts themselves is what it is. Subtle things like the dialect of writers can be gleaned from them, and letterforms may help to date a text.

>>> - There may not be enough information to understand how the creators and early users of the script saw this issue,
>>
>> Um, yeah. As if there were for Phoenician, or Luwian hieroglyphs, right?
>
> Well, there's well over an order of magnitude difference in the time scales involved. The language that Deseret is used to write is still in active use, including in this very discussion. Quite different from Phoenician or Luwian hieroglyphs.

The language is still in use, but we have no access to the minds of the dead users of Deseret unless they write about their orthographic practices explicitly. Accurate transcription can tell us if the speaker was from Boston or Britain, if for instance they regularly drop -r- in words like “start”.

> In addition, we have meta-information such as alphabet tables, which we may not have for the scripts you mention, as well as the fact that printing technology may have forced a better identification of what's a character and what not than inscriptions and other older technologies.

Well, we know there was a script reform in Deseret with regard to these and some other characters.

>> Nobody worried about the number of modern users of the Insular letters we encoded. Why put such a constraints on users of Deseret? Ꝺꝺ Ꝼꝼ Ᵹᵹ Ꝿ Ꞃꞃ Ꞅꞅ Ꞇꞇ.
>
> Because it's modern users, and future users, not users some hundred years or so ago, that will use the encoding. In the case of Insular letters, my guess is that nobody wants to translate/transcribe xkcd, for example, whereas there is such a transcription for Deseret:
> http://www.deseretalphabet.info/XKCD/

Modern users use the insular letterforms for accurate representation of some texts. John does the XKCD transcriptions, I believe, and he doesn’t use the diphthong letters anyway, and that’s his orthographic practice.

>> Most readers and writers of Deseret today use the shapes that are in their fonts, which are those in the Unicode charts, and most texts published today don’t use the EW and OI ligatures at all, because that’s John Jenkins’ editorial practice.
>
> So I was wrong to write "modern practitioners", and should have written "modern publishers" or "modern published texts". Or is the impression that I get from what you wrote above wrong that most texts published these days are edited by John, or by people following his practice?

John is active in the area of making and publishing modern editions in Deseret. Ken has worked in the area of manuscripts and their represntation.

> I don't remember denying the value of separate encodings for historic research. I only wanted to make sure that present-day use isn't inconvenienced to make historic research easier.

Adding new characters won’t affect people who don’t want to use those characters in particular, though.

> If the claims are correct that present-day usage is mostly a reconstruction based on the Unicode encoding and the Unicode sample glyphs, then I'm fine with helping historic research.

OK, good. Those modern users who want to use 𐐦 and 𐐧 will still be able to do so. Those who want to use the 𐐃-with-stroke and 𐐋-with-stroke characters will be able to do so if they are encoded. And there are some other letters not yet encoded.

>> This is exactly the same thing as the medievalist Latin abbreviation and other characters we encoded. There is neither sense nor logic nor utility in trying to argue for why editors of Deseret documents shouldn’t have the same kinds of tools that medievalists have. And as far as medievalist concerns go, many of the characters are used by relatively few researchers. Some of the characters we encoded are used all over Europe at many times. Some are used only by Nordicists, some by Celticists, and some by subsets within the Nordicist and Celticist communities.
>
> Maybe, maybe not. If e.g. somebody came and said that they wanted to disunify the ſs and ſz ligatures for (German) ß in order to better analyze some old manuscripts, and the modern users from hereon had to make sure they used the right one depending on the font they used, then I'm sure a lot of Germans would complain quite clearly, because it would make their current use more complicated.

That’s not true, though. We have both s and ſ encoded, and we gave both r and ꝛ encoded, and the long s and r rotunda do not bother any modern user of the Latin script or force them to alter their orthography.

>> Harm? What harm? Recently the UTC looked at a proposal for capital letters for ʂ and ʐ. Evidence for their existence was shown. One person on the call to the UTC said he didn’t think anyone needed them. Two of us do need them. I needed them last weekend and I had to use awkward workarounds. They weren’t accepted. There wasn’t any good rationale for the rejection. I mean, the letters exist. Case is a normal function of the script. But they weren’t accepted. For the guy who didn’t think he needed them, well, so what? If they’re encoded, he doesn’t have to use them.
>
> I have no idea what the reasons for this were, because I wasn't involved in the discussion.

As I recall, because one person ended up agreeing “We don’t need to encode characters for failed orthographies”. The entire Deseret script is a failed orthography of course, and that viewpoint ignores (in this case) the historical importance of Pinyin and its development. But from a functional point of view I needed capitals for those two letters (not related to early Pinyin) and had to use workarounds. That is not a satisfactory situation.

>> People who use Deseret use it to for historical purposes and for cultural reasons. Everybody in Utah reads English in standard Latin orthography.
>
> I haven't been in Utah except for a one-time flight change in Salt Lake City more than 10 years ago. So please don't assume that everybody on this list know the state of usage for all the scripts that get discussed.

OK< but https://en.wikipedia.org/wiki/Deseret_alphabet is a pretty good article.

>> I didn’t “come up” with separate historical derivations for the four characters in question.
>
> I didn't mean "come up" in the sense of "make up out of thin air", but in the sense of "discover". If it wasn't you but somebody else who discovered these derivations, please let us know.

All it took was a look at https://en.wikipedia.org/wiki/Deseret_alphabet#/media/File:Deseret_glyphs_ew_and_oi_transformation_from_1855_to_1859.svg to KNOW without question the derivation of these letters, namely 𐐅/𐐋/𐐉/𐐃 with the stroke of 𐐆. It’s blindingly obvious! :-)

>>>> What Deseret has is this:
>>>>
>>>> 10426 DESERET CAPITAL LETTER LONG OO WITH STROKE
>>>> * officially named “ew” in the code chart
>>>> * used for ew in earlier texts
>>>> 10427 DESERET CAPITAL LETTER SHORT AH WITH STROKE
>>>> * officially named “oi” in the code chart
>>>> * used for oi in earlier texts
>>>> 1xxxx DESERET CAPITAL LETTER LONG AH WITH STROKE
>>>> * used for oi in later texts
>>>> 1xxxx DESERET CAPITAL LETTER SHORT OO WITH STROKE
>>>> * used for ew in later texts
>>>
>>> Currently, it has this:
>>>
>>> 10426 𐐦 DESERET CAPITAL LETTER OI
>>>
>>> 10427 𐐧 DESERET CAPITAL LETTER EW
>>
>> You are being deliberately obtuse. Note that I stated clearly “officially named ‘ew/oi’ in the code chart”.
>
> Well, if you think I'm deliberately obtuse, then I'd have to say that I think you're (deliberately?) obscure.

I was making a point; sorry if you didn’t catch it. The names as given in that list above are the kinds of descriptions of the letters that we often give. We have LATIN LETTER THORN WITH STROKE. We might have named it LATIN LETTER THAT.

> You repeat hypothetical, non-existing names

They’re descriptive of the letter, not of the diphthong.

> such as "DESERET CAPITAL LETTER LONG OO WITH STROKE" over and over, using capitals to make then look like the actual names, and bury the actual names (such as "DESERET CAPITAL LETTER OI") by shortening and lowercasing them.

Well, I lowercased them because lowercase is used in informative notes. Anyway, sorry if my rhetoric failed to hit the mark. :-)

> But even if that weren't the case, we would still want to treat it as one and the same character, with a single code point. It would still be hopelessly impractical for Germans to use two different characters, when they only can decide which character to type once they have seen the actual character in the font they type, and have to potentially change the character if they change the font.

But even if we did encode an ſʒ letter (similar to the T-Z ligature-letter Ꜩ ꜩ we did encode) it would be encoded for a special purpose, and wouldn’t be intended to affect standard German. Look, we can write schön and we can write ſchoͤn and nobody’s affected by the latter.

> And while we currently have no evidence that Deseret had developed a typographic tradition where some type styles would use one set of ligatures, and other styles would use another set, it wouldn't be possible to reject this possibility without actually trying to find evidence one way or another.

There was type during the heyday of Deseret use, and evidence for several sorts but no typographic “tradition” really. That’s happened latterly.

>> Your argument seemed to be based solely on the use of the letters for the sounds, ignoring the historical derivation and the facts of the spelling reform in Deseret.
>
> The spelling reform is fine. What is important is what happened after the spelling reform. Were the 1855 variants replaced by the 1859 variants? Was it two different traditions, separated in some way or other? Or was it in effect more like a mixture of both?
> (or maybe we don't know, or it's a little of everything?)

Where they were replaced, it helps to identify the provenance of a text. There are also some texts where there’s a bit of a mix. In fact adding some letters to the standard for Deseret will improve users’ ability to represent the historical texts. For those relatively few people who are creating new texts now, they will be able to choose what letters they need. Some, like John, don’t use the diphthong letters at all. In fact most modern readers read John’s texts, so few would probably worry about the other letters.

> Examining these questions and bringing the available data to light and clarifying the limits of our data and our understanding is very important. Only in this way can we make decisions that will hopefully be valid for the rest of the existence of Unicode (which might be quite a few decades at least), or decisions that at a minimum might be evaluated as "well, they didn't know better then", rather than as "they definitely should have known better, even then”.

Really, my practice when approaching this is the same as it has been for additions to Latin or Greek or Cyrillic. I’m quite consistent. :-)

>> A proposal will be forthcoming. I want to thank several people who have written to me privately supporting my position with regard to this topic on this list. I can only say that supporting me in public is more useful than supporting me in private.
>
> I'm looking forward to your proposal. I hope it clearly indicates why (you think) there's no danger of inconveniencing modern practitioners.

To be honest, we didn’t have to say “r rotunda will not affect modern users of the Latin script”, now, did we? :-)

Today I received Ken’s book on the Deseret-script English-Hopi vocabulary. This will help us move forward with a proposal.

Best,
Michael Everson
Received on Thu Apr 06 2017 - 09:07:52 CDT

This archive was generated by hypermail 2.2.0 : Thu Apr 06 2017 - 09:07:52 CDT