Re: various stroked characters

From: Jim Allan (jallan@smrtytrek.com)
Date: Sun Sep 08 2002 - 22:25:37 EDT


 From my experience in some general historical linguistics and some work
in ancient middle-eastern languages both an underscore and a bar
through a letter usually have identical meanings, normally indicating a
softer pronounciation then would indicated by the same glyph lacking the
bar. For example, / _d_/ would indicate the sound [ð] or possibly [dj]
or [z].

There is a weak tendency to prefer the bar through the letter when
transliterating from a character set where the sounds are also
distinguished by separate letters of the source orthography. When the
weaker sound is not so distinguished, or when distingushed by a
diacritic (or lack of diacrtic in pointed Hebrew), then an underbar
diacritic in the Latin transliteration makes more sense than using a
special letter form.

But, generally, the forms in Unicode that appear with "LINE BELOW" in
their names and have a canonical composition ending with U+0331 are in
meaning variants of the forms with a stroke through them, these stroked
forms not being decomposed in Unicode.

A problem occurs when the letter has a descender, for example /g/ and
/p/. Sometimes the line is placed beneath the descender and sometimes
through the descender (as with standard underlining). Sometimes it is
placed above the letter instead, as with /g/ + cedilla in which the
cedilla may appear depending on font either beneath the /g/ or above it
in inverted form or above it as a turned comma or above it as an acute
accent.

Essentially, most linguists on coming across U+1E21 LATIN LETTER SMALL G
WITH MACRON or U+01E5 LATIN SMALL LETTER G WITH STROKE would interpret
each as a variant of the same method of indicating a soft consonant,
probably indicating the voiced, velar fricative as the /g/ in German
/Tage/. Use of one of these over the other would be based on traditional
Latin transcription practise in the language being transcribed, the
particular taste of the author, and what characters were to be had in
the fonts available.

Yet it would be wrong to say that /g/ with macron above might never be
distinguished from /g/ with macron beneath, for a macron over a
consonant was sometimes used as a short writing for a double consonant
in eighteenth century orthographies of some European languages.

Some examples of the bar at various heights:

At http://www.linguistsoftware.com/lnavajo.htm is a Navaho character set
in which lowercase /g/ with macron below is paired with uppercase /G/
with macron below.

In the images for the Semitic translator font at
http://www.linguistsoftware.com/st.htm we also see the line falling
beneath the descender of /g/ (and also beneath the descender of /p/) and
of course beneath various other letters.

But in http://www.linguistsoftware.com/lkw.htm for a Kwalwala character
set a /g/ with a line through its descender rather than beneath it is
matched with an uppercase/ G/ with line beneath.

The next step in moving the bars up is the Skolt Sami forms given in
Unicode which differ in placing the bar through the uppercase /G/
instead of beneath it.

The Evyoni Hebrew fonts available at
http://members.tripod.com/~ebionite/fonts.htm use a macron over the
lower case /g/ but still a macron underneath the uppercase /G/. These
fonts ignore soft /p/ and and soft /P/ altogether, using /f/ and /F/ in
transliteration instead. See http://members.tripod.com/~ebionite/noah.htm.

Finally, the SIL Heb trans fonts place the bar above the letters /g/ in
both the upper and lower case forms though in other soft letters the bar
falls beneath. See the fonts available at
http://www.sil.org/computing/fonts/silhebrew.

All are equally correct.

The excellent Junicode fonts available at
http://www.engl.virginia.edu/OE/junicode/junicode.html include all of
Latin Extended-B and therefore the Skolt Sami barred /g/ characters and
the /g/ with macron characters. But they also includes in the PU area
/p/ with a bar through the descender and/ P/ with a bar through the stem.

I know I have also seen printed occurrence of a bar through the top bowl
of the /g/ and the bowl of the /p/.

But, if in a sense, these are only typographical variants, all are not
are equally acceptable in all traditions. Would a macron over the
capital /G/ and lower case /g/ be an acceptable variant to the
approximately 400 speakers of Skolt Sami? I suspect not. Perhaps
underlined capital _/G/_ and underlined _/g/_ would be more acceptable.
Perhaps not. Particularly variants may be felt to be important in
particularly linguistic communities.

The characters with the slanted overlays are yet other variants used in
certain traditions, but not acceptable ones in all traditions. I've
never seen them used in Semitic transcriptions, for example.

Would it be acceptable for software to render a macron beneath as a
macron above on letters with descenders, as is normally supposed to be
done with cedilla on /g/? One could validly claim that the result would
be seen as the same letter. In IPA. for example, there is a standard
rule that a diacritic normally placed beneath a letter may be placed
above it if the letter has a descender and the meaning is otherwise not
obscured.

Should fonts do this automatically?

Currently a user of Unicode 3.2 wanting to indicate a weak /g/ in
transliteration or phonetic granscription has the choice of the pairs:

U+0067 LATIN SMALL LETTER G + U+0331 COMBINING MACRON BELOW
U+0047 LATIN CAPITAL LETTER G + U+0331 COMBINGING MACRON BELOW

U+0067 LATIN SMALL LETTER G + U+0304 COMBINING MACRON or the
corresponding composition U+1E21 LATIN SMALL LETTER G WITH MACRON
U+0047 LATIN CAPITAL LETTER G + U+0331 COMBINING MACRON BELOW

U+0067 LATIN SMALL LETTER G + U+0332 COMBINING LOW LINE
U+0047 LATIN CAPITAL LETTER G + U+0332 COMBINING LOW LINE
(This is what we would have to do with /p/ currently to hint that the
line should pass through the descender of the lower case letter as there
is no corresponding composed character.)

U+0067 LATIN SMALL LETTER G + U+0304 COMBINING MACRON or the
corresponding composition U+1E21 LATIN SMALL LETTER G WITH MACRON.
U+0047 LATIN CAPITAL LETTER G + U+0331 COMBINING MACRON or the
corresponding composition U+1E20 LATIN CAPITAL LETTER G WITH MACRON

U+01EF LATIN SMALL LETTER G WITH STROKE
U+0047 LATIN CAPITAL LETTER G + U+0331 COMBINING MACRON BELOW

U+0067 LATIN SMALL LETTER G + U+0035 COMBINING LONG STROKE OVERLAY
U+0047 LATIN CAPITAL LETTER G + U+0035 COMBINING LONG STROKE OVERLAY
(This is what we have have to do with /p/ currently to hint that the
line should pass through the bowl as there is no corresponding composed
character. There is no way to indicate whether the line should pass
though the stem or loop of the upper case /P/. Nor is Unicode very clear
on the specifications of short and long.)

U+01EF LATIN SMALL LETTER G WITH STROKE
U+0047 LATIN CAPITAL LETTER G + U+0035 COMBINING LONG STROKE OVERLAY

U+0067 LATIN SMALL LETTER G + U+0337 COMBINING SHORT SOLIDUS OVERLAY
U+0047 LATIN CAPITAL LETTER G + U+0338 COMBINING LONG SOLIDUS OVERLAY

I suppose a search engines ought to match all such forms?

The same value can also be indicated by the pairs:

U+0263 LATIN SMALL LETTER GAMMA
U+0194 LATIN CAPITAL LETTER GAMMA

U+03B3 GREEK SMALL LETTER GAMMA
U+0393 GREEK CAPITAL LETTER GAMMA
(Greek gamma has often been used when a suitable IPA font was not
available.)

U+021D LATIN SMALL LETTER YOGH
U+021D LATIN CAPITAL LETTER YOGH

U+0292 LATIN SMALL LETTER EZH
U+01B7 LATIN CAPITAL LETTER EZH
(This pair has often been used as a typographical substitute for the
proper yogh pair.)

Considering the mess we have now, there would be little further harm in
adding particular horizontal and oblique stroke characters (with no
decomposition) which could at least be depended on to print nicely.

On the other hand, the current situation may push users to avoid stroke
overlays (where not provided on composed glyphs or strongly imposed by
tradition) in favor of either macrons beneath or IPA characters with the
appropriate capitals when needed, which would do no harm either.

Jim Allan

> --- On Thu 09/05, Kenneth Whistler wrote:
> From: Kenneth Whistler [mailto: kenw@sybase.com
> <mailto:kenw@sybase.com?Subject=Re:%20various%20stroked%20characters%2526In-Reply-To=%2526lt;20020906063758.1339E3D2A@xmxpita.excite.com%3E>]
>
> To: Peter_Constable@sil.org
> <mailto:Peter_Constable@sil.org?Subject=Re:%20various%20stroked%20characters%2526In-Reply-To=%2526lt;20020906063758.1339E3D2A@xmxpita.excite.com%3E>
>
> Cc: unicode@unicode.org
> <mailto:unicode@unicode.org?Subject=Re:%20various%20stroked%20characters%2526In-Reply-To=%2526lt;20020906063758.1339E3D2A@xmxpita.excite.com%3E>,
> kenw@sybase.com
> <mailto:kenw@sybase.com?Subject=Re:%20various%20stroked%20characters%2526In-Reply-To=%2526lt;20020906063758.1339E3D2A@xmxpita.excite.com%3E>
>
> Date: Thu, 5 Sep 2002 18:27:17 -0700 (PDT)
> Subject: Re: various stroked characters
>
> /> Peter,/
> /> /
> /> Here's my take on your questions./
> /> /
> /> > The less clear cases involve b, d and g./
> /> > /
> /> > 1) Lower case "b" with a horizontal stroke through the bowl /
> /> (hereafter/
> /> > "b-stroke-bowl") is used in some phonetic traditions for/
> /> voiced bilabial/
> /> > fricative (beta, in IPA). The annotation for U+0180 ("b"/
> /> with a horizontal/
> /> > stroke across the ascender) indicates that one of its intended /
> /> purposes is/
> /> > for phonetic transcription of the same phone. Of course, U+03B2/
> /> (beta) also/
> /> > has this function and is not unified with 0180, but these are/
> /> clearly/
> /> > distinct characters (e.g. 0180 and 03B2 have other unrelated /
> /> functions). I/
> /> > can't imagine anyone using b-stroke-bowl contrastively with 0180./
> /> Thus,/
> /> > probably the best option is to treat b-stroke-bowl as a typographic /
> /> variant/
> /> > of 0180./
> /> > /
> /> > Any opinions confirming this view or to the contrary?/
> /> /
> /> I agree./
> /> /
> /> This is what Pullum and Ladusaw called the "Barred B", as/
> /> opposed to the/
> /> Indo-European "Crossed B" (i.e. U+0180):/
> /> /
> /> "By a general convention, barred stop symbols (with a superimposed /
> /> hyphen or short dash through the body of the letter) are often used/
> /> to represent those fricatives for which the IPA symbols are not used./
> /> The resultant symbols have the advantage of being easy to type on an/
> /> unmodified typewriter."/
> /> /
> /> By the way, there is also the "Slashed B", which is another/
> /> alternative/
> /> form for the Barred B, used for the same purpose, but instantiated by/
> /> typing b / instead of b -./
> /> /
> /> For what it is worth, the founders of Unicode considered these three/
> /> forms to be allographs of an abstract barred-b character, so that is/
> /> what the current situation is. Trying to separately encode a "Barred /
> /> B"/
> /> distinct from the "Crossed B" would, at this point, constitute /
> /> an/
> /> explicit disunification, rather than simply a discovery of an
> overlooked/
> /> character to encode./
> /> /
> /> > 2) Next, consider the g. The representative glyph in TUS3.0 for/
> /> U+01E5/
> /> > shows a double-bowl g with a horizontal stroke through both sides of/
> /> the/
> /> > bottom bowl. The annotation indicates that it is used for Skolt/
> /> Saami./
> /> > Looking at a few fonts, I see some variations: Andale Mono and Code/
> /> 2000/
> /> > have a double-bowl g with a horizontal stroke through *the right
> side/
> /> only*/
> /> > of the lower bowl; Lucida Sans Unicode and Arial Unicode MS have a/
> /> > single-bowl g with a horizontal stroke through the right side
> only of/
> /> the/
> /> > bowl./
> /> /
> /> Pullum and Ladusaw show two other glyphic alternatives:/
> /> /
> /> "Barred G" with an IPA style "g" and a horizontal/
> /> stroke through the bowel./
> /> /
> /> "Crossed G" with an IPA style "g" and a horizontal/
> /> stroke through the descender./
> /> /
> /> > /
> /> > Now, what I'm concerned with is a g (single-bowl in all instances /
> /> I'm/
> /> > familiar with) that has a horizontal stroke through both sides of/
> /> the/
> /> > (upper -- only) bowl, used in some phonetic traditions to represent/
> /> a/
> /> > voiced velar fricative (IPA gamma). Any opinions on whether to treat/
> /> this/
> /> > as a new character or as a typographic variant of U+01E5? /
> /> /
> /> All allographs of the same underlying character. The same concepts /
> /> and analogies apply here. The "Crossed G" was probably/
> /> explicitly/
> /> formed by analogy from the more-attested Crossed B and Crossed D./
> /> The ones with horizontal strokes through the bowel are all just /
> /> variants on what happens when you backspace and put a hyphen across /
> /> your "g"./
> /> /
> /> /
> /> > 3) Finally, the d. Unicode has three upper-case stroked-d characters/
> /> for/
> /> > which the representative glyphs are identical, but which have/
> /> distinct/
> /> > lower-case counterparts (the basis for having three distinct /
> /> upper-case/
> /> > characters). Of the three pairs, two really aren't relevant to this/
> /> > discussion. The one relevant pair is U+0110 LATIN CAPITAL LETTER D/
> /> WITH/
> /> > STROKE, and U+0111 LATIN SMALL LETTER D WITH STROKE./
> /> > /
> /> > Now, in some phonetic traditions, a "d" with a horizontal /
> /> stroke through/
> /> > the bowl (both sides) is used for a voiced interdental fricative /
> /> (IPA/
> /> > U+00F0). Some phonetic traditions represent this using U+0111. /
> /> > /
> /> > I've also learned of some African languages that are written with/
> /> upper and/
> /> > lower stroked d; I've seen samples that show some glyph variation: /
> /> some/
> /> > samples show a horizontal stroke that crosses both sides (both upper/
> /> and/
> /> > lower case); other samples show the horizontal stroke on only one/
> /> side --/
> /> > through the stem of the upper case (just like U+00D0, U+0110 and/
> /> U+0189),/
> /> > and through the right side of the bowl of the lower case (not
> through/
> /> the/
> /> > ascender, as shown in the charts for U+0111)./
> /> > /
> /> > So, again: any opinions on whether d-stroke-bowl should be unified/
> /> with/
> /> > U+0111 or considered a new character?/
> /> /
> /> Again, all allographs of the same underlying character. And once /
> /> again, as for "b", there are, in addition to the "Crossed/
> /> D" and/
> /> "Barred D" allographs, also a "Slashed D" allograph./
> /> /
> /> There is no need to proliferate distinct encodings for these,
> whether the/
> /> slashes of the "Barred D" forms go all the way across or just/
> /> partway/
> /> across either the lowercase and/or the uppercase forms. Those are just/
> /> various typographic attempts to do decent design for the letter forms/
> /> based on the concept of having to apply a horizontal stroke to the/
> /> "d"/"D" forms./
> /> /
> /> --Ken/
> /> /
> /> /
> --Reply--
> Hello again, Ken and all Unicoders!
> Concerning horizontally-barred/crossed consonants, I've observed the
> following:
> ·BARRED CONSONANTS ALREADY IN UNICODE: b-bar (L-C only), D-bar/d-bar,
> G-bar/g-bar, H-bar/h-bar, l-bar (L-C only), T-bar/t-bar
> ·BARRED CONSONANTS *NOT YET* IN UNICODE--THUS NEEDING PROPOSALS FOR
> INCLUSION: B-bar (H-C form), K-bar/k-bar, L-bar (H-C form),
> P-bar/p-bar (with crossbar through upper bowl), Q-bar/q-bar (for
> voiced *quf*), R-bar/r-bar, S-bar/s-bar, X-bar/x-bar . . . .
> Several of these barred consonants (b-bar, d-bar, g-bar, k-bar) are
> used in the Padre Recuero Transliteration of Ladino, while (h-bar) is
> used in Maltese--and in the UMRE system (there for voiced *h*). Please
> take heed to the "need proposals for inclusion" list very seriously.
> Thank You!
>
> Robert Lloyd Wheelock
> Augusta, ME USA



This archive was generated by hypermail 2.1.2 : Sun Sep 08 2002 - 23:17:00 EDT