Re: HIGH OGONEK in IR-158 (Skolt Sami)

From: becker.osbu_north@xerox.com
Date: Tue Sep 22 1998 - 14:37:31 EDT


> Ken Whistler at one time wrote a detailed, highly researched paper (via
> e-mail) on this "bogus" character HIGH OGONEK.

Great remembrance, Rick! Here it is to enjoy,

Joe

----------------------------------------------------------------
Date: 4 Apr 91 14:23:26 PST (Thursday)
Subject: High Ogonek
From: whistler@zarasun.metaphor.com (Ken Whistler)
To: unicode@sun
cc: "iso10646%jhuvm.bitnet"@decwrl.dec, whistler@zarasun.metaphor

Warning to readers: This contribution contains real research,
so if you haven't got time to care, you can delete it now!

The "High Ogonek" has stuck in my craw for so long that I feel
I must say something about it. The High Ogonek is symptomatic of
one of the things wrong about the character standardization business,
which encourages the blithe perpetuation of mistaken "characters"
from standard to standard, like code viruses. At least, in the past,
the epidemic was constrained by the fact that the encoding bodies
only had 256 cells which could get infected by such abominations
as half-integral signs. Now, however, with
Unicode and ISO 10646 and the AFII registry, and other 2 byte
corporate standards, the number of cells available for infection
is vast, and the temptation to encode everybody else's junk just
seems to have become irresistible.

WHENCE HIGH OGONEK?

"High Ogonek" can be found in ISO DIS 10646 (JTC1/SC2/WG2 N666) at
034/126. What is it? Well, that's a good question, and 10646 doesn't
provide a clue--but then it doesn't say anything about where any of
its content comes from. But for those in the know, the source of
"High Ogonek" in the DIS 10646 can be tracked to ECMA/TC1/90/15,
Latin Alphabet No. 6, and more specifically to Appendix A, which
reproduces 34 characters "registered according to ISO 2375 as
Registration No. 158", for "text in the Skolt Lappish dialect,
as well as texts using older Lappish orthography..." Position 03/00
in the code table of Registration No. 158 is our critter. So now
we know what it is, right? Wrong. The ill-defined squiggle in
position 03/00 does indeed look something like an ogonek (mistaken
ogonek forms are themselves another tale of woe I won't get into
here), and the "ogonek" in 03/00 is indeed high in its box--hence
the "High Ogonek" in DIS 10646, drawn in position 034/126 as a
nondescript rightward hook.

Well, reviewers of 10646 have complained about "High Ogonek", and
something has indeed been done. In JTC1/SC2/WG2 N680 "Updated
code table charts", dated 22 March 1991, the "High Ogonek" has
now been printed using a high reversed comma, quite sharply distinguished
from the "Ogonek" at 033/178. In fact, it looks remarkably like
an aspiration mark--hmmm. For those of you with long memories
or big filing cabinets, the 2nd DP of 10646 had just such a thing
at 171/072, labeled "IPA ASPIRATION MARK", but all the IPA later disappeared
in the DIS, just as the strange "High Ogonek" appeared.

N680 was "generated by AFII using their publishing system," so it
would behoove us to check whether the "High Ogonek" virus has spread
to AFII--and guess what! The draft AFII registry has a new glyph
id 043B/241B devoted especially to printing the 10646 "High Ogonek".
The AFII glyph looks like a high reversed comma, and is labeled:

        "High ogonek" (not a non-spacing character, but
        rather a separate character within words) (Lapp)

That's strange, because AFII has what appears to be the same glyph
encoded at 342B/110B, labeled:

        Aspirated, IPA

So AFII and 10646 seem to have decided these things are different.
Welcome to the "High ogonek".

What about Unicode? I don't think I would be telling any tales out
of school if I revealed that Unicode almost got a "High ogonek", too,
since Unicode was busy incorporating all the 10646 mistakes in Unicode
while 10646 was busy incorporating all the Unicode mistakes in 10646.
(Gives you an Excedrin headache, doesn't it?) But some degree of
reason has prevailed, and the Skolt Lappish "High Ogonek" is now
simply mapped to Unicode U+02BD MODIFIER LETTER REVERSED COMMA (which
is explicitly intended as the IPA aspiration mark).

Is that the right answer? Well, how about doing what should have
been done in the first place--some research--instead of just citing
other character standards like holy books.

TRANSCRIPTION OF ASPIRATION IN LAPPISH

Based on a fairly quick survey, I note three broad groups of
treatment of Lappish transcription:

1. Prewar (pre World War II) publications using systems based on
Finno-Ugrian practice (which itself is an offshoot of the transcription
used by Indo-Europeanists). Non-phonemic, non-systematic phonetic,
and inconsistently narrow transcription.

2. Early postwar publications. Systematic phonemic, but with a nod
to old-fashioned transcription and IPA usages.

3. "Modern" publications (70's and 80's). Phonemic, with systematic
phonetic realization rules, and with tuned practical orthographies.
(E.g. "sj" for esh, rather than s-acute or s-hacek, etc.)

Going from best to worst, i.e. recent to early, we have the following
facts.

In modern treatments, aspiration is not part of Lappish orthography.
Why? I'll let the best analyst explain it:

        Die Verschlusslaute werden in phonetischer Hinsicht
        entweder als mehr oder weniger stimmhafte Lenes [b d g]
        oder als stimmlose Fortes realisiert. Die letzteren ko"nnen
        entweder unaspiriert [p t k], pra"aspiriert [hp(p) ht(t) hk(k)]
        oder postaspiriert [ph th kh] ausgesprochen werden.

(Su"dlappisches Wo"rterbuch, Gustav Hasselbrink, Uppsala 1981,
Ab Lundequistska Bokhandeln, p. 42.) In other words (South) Lapp
has a lenis and a fortis series of stops, and the fortis series
may be either unaspirated, preaspirated (in geminate contexts) or
postaspirated, depending on the context. Since degree of aspiration
is predictable by context, it need not be represented in the
orthography. However, when Hasselbrink wants to explicitly transcribe
aspiration phonetically, he does so with an inline "h" or a raised
"h"--the distinction being primarily whether phonological pattern
or phonetic quality is in question.

G. M. Kert published a very similar analysis in Saamskii Yazyk,
Leningrad 1971, Soviet Academy of Sciences. See, for example, the
phonological chart on p. 63. (I won't quote anything--Cyrillic in
ASCII is too painful.)

The early postwar treatments of Lapp also use a standardized
orthography for Lapp, with two stop series, but are sometimes hazier
about the status of each series. They also tend to use the {raised
reversed comma} to indicate aspiration explicitly. Examples are:
Wo"rterbuch des Waldlappendialekts von Mala{ring} und Texte zur
Ethnographie, Wolfgang Schlachter, Helsinki 1958, Suomalais-
Ugrailainen Seura. Also: The Lappish Dialect of Jukkasjo"rvi,
A Morphological Survey, Bjo"rn Collinder, Uppsala, 1949,
Almqvist & Wiksells Boktryckeri Ab:

        31. k, p, t are unaspirated (as c, p, t in French) if
                they are not followed by the sign [{raised
                reverse comma}] (see Section 59).
                                --p. 11

Then we get to the pre-phonemic transcriptions. These have no
systematic understanding of phonological derivation and phonetic
realization, and tend to have either broad or narrow "phonetic"
orthographies, with symbols derived from Finno-Ugrian practice.
Example 1: Lappisher Wortschatz, Eliel Lagercrantz, Helsinki, 1939,
Suomalais-Ugrilainen Seura (2 vols.). This lexicon systematically
transcribes aspiration, and does so with a {raised small cap h}
after stop consonants.

Example 2 is a massive work, and represents the extreme of
unsystematic narrow phonetic transcription: Lappisk Ordbok,
Konrad Nielsen, Oslo 1962, Universitetsforlaget (5 vols.). Don't
let the date of publication fool you--the words were collected
from 1906-1911, the compilation was begun in 1929, and the first
signature was printed in 1930. Nielsen uses a plethora of diacritics
for all kinds of things, since this is a cross-dialectal compilation.
For explicit aspiration, he uses a {raised left half ring} (cf.
Unicode U+02BF), which is a common Indo-European and/or Finno-Ugrian
typographical substitute for the {raised reversed comma}. Since
Nielsen also follows the Indo-European tradition of typesetting
cited forms in italics, the {raised left half ring} also gets
leaned over a bit and then is strongly kerned up over the "knee"
of the "k"'s or "h"'s (yes!, aspirated "h"'s), and nestles in
above the cross-bar's of the "t"'s. So for the typesetter, these
aspirated forms were probably a single piece of type, but the analysis
clearly shows the {raised left half ring} to be, in principle,
a "spacing" diacritic following a stop (or "h").

My brief survey of these works did not turn up any specifically
dealing with the "Skolt Lapp" dialect, but the general picture is
clear. Aspirated phones do exist in Lappish dialects, and the
aspiration has been traditionally transcribed using either a {raised
reversed comma} or a typographical variant of that, the {raised left
half ring}. The Skolt Lapp texts referred to in ECMA/TC1/90/15
presumably follow this orthographic tradition, influenced by Nielsen
or other early analysts. Modern Lapp orthographies omit
transcription of aspiration altogether. (Incidentally, Nielsen
appears to be the source of the g-bar for transcribing a palatal
voiced fricative in Lapp; modern analysts like Hasselbrink
sensibly substitute a "j" for this sound. And as long as I am
picking nits, Nielsen's "g-bar" is actually a "g" with an underline
strike-thru at the baseline, not the "g" with a short bar sticking
out the side as shown in position 034/188 in 10646.)

WHITHER HIGH OGONEK

Into the nearest dumpster, I hope. We are dealing here with a
perfectly normal manifestation of European transcription of
aspiration--as manifested in thousands of transcriptions of
hundreds of languages. There is nothing specifically Lapp about
it, and it has absolutely nothing to do with the ogonek.

--Ken Whistler

----------------------------------------------------------------



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:41 EDT