Re: graphemes

From: Christoph Päper <>
Date: Tue, 27 Sep 2016 16:28:15 +0200
Janusz S. Bień <>:

On Sun, Sep 18 2016 at 12:26 CEST, writes:
Quote/Cytat - Christoph Päper <> (pią, 16
wrz 2016, 23:51:38):

Janusz S. Bień <>:

1. Graphemes, if I understand correctly, are language dependent, …

That’s true in linguistic terminology – … –, but not in technical (i.e.
Unicode) jargon.

And what is "grapheme" in "technical (i.e. Unicode) jargon"?

It depends on the script (hence Unicode block), but not the writing system or language. The line is not always drawn consistently.

From the Unicode glossary:

Grapheme. […] (2) What a user thinks of as a character.

User-Perceived Character. What everyone thinks of as a character in their script.

Does 'Grapheme' (2) make sense with "a (single?) user"?

No linguistic term makes sense with only a *single* user (“Privatsprache”). It’s a very vague definition, but not quite incorrect for “a typical user”.

BTW, it is rather well know that the term "phoneme" was proposed first by a Polish linguist Jan Niecisław Ignacy Baudouin de Courtenay (…).  It is much less know that he proposed also the term "grapheme".

Yes, he introduced both terms, but the definitions have changed quite a bit through history and among schools. Entire books have been published about that, e.g. (in German) Manfred Kohrt (1985): “Problemgeschichte des Graphembegriffs und des frühen Phonembegriffs” (ISBN 3-484-31061-8) – I wish I knew a more recent one.

Alexander Berg's "English Historical Linguistics vol. I" page 230 […]:

     […] the available definitions [of “grapheme”]
     can be divided into two groups, corresponding to two main senses,
     and reflecting "conflicting linguistics views of the status of
     writing" (Henderson 1985:142):

     1. a letter or cluster of letters referring to or corresponding with a
     single phoneme;

     2. the minimal distinctive unit of a writing system.

For me the first meaning (…) is the primary, i.e. more useful, meaning, as is has some practical applications e.g. for describing Polish hyphenation rules.

Type 1 has also been called “phono-graphemes” (with or without the hyphen).

The conflicting views quoted from the 30 years old work by Henderson still exist. Many scholars – yourself included, it seems – infer a structural primacy of spoken language over written language from its historic primacy. Others – myself included – acknowledge that speech comes first ontogenetically and came first phylogenetically, but assert that these two modes of languages now exist and develop either independently or interdependently. That means, although all humans acquire language (at least their mother tongues) in speech (which includes signing) first and although all proper writing in the linguistic sense is phonographic somehow and thus derived directly from speech originally, the actual graphical/literal signs do not depend on aural/oral language (any more). Neither do the constituents of these linguistic signs, i.e. phonemes and graphemes. In all natural writing systems, they correspond to each other, hence also pronunciation and spelling, in more or less complex ways, usually involving a great deal of morphologic knowledge – general 1:1 or 1:n mappings are a myth, an unachievable ideal at most, as more complex contextual rules are always required, even in “shallow” orthographies.

Actually, ‘encyclop[ae|æ|e]dia’ exhibits an instance of a general alteration rule in English graphematics, consisting of simple substitutions:

 ⟨æ⟩ → ‹æ› / ‹ae› / ‹e›

That means the grapheme ⟨æ⟩ has (at least) three possible allographs, which are selected by higher-level constraints without positional restrictions (as far as I know). These letter sequences may be allographs of other graphemes, too.

In ‘ni[ght|te]’, on the other hand, we can examine two alternate graphotactic rules at work. Without delving deeper into English orthographic theories, I’d provisionally phrase these as

 1. <simple nucleus> <⟨h⟩ digraph> <final> #
 2. <simple nucleus> <final> <echo nucleus> #
 3. <complex nucleus> <final> #
 4. …

    <echo nucleus> → ⟨e⟩ / ⟨∅⟩ [morphologic restrictions]
    <simple nucleus> → ⟨i⟩ / ⟨y⟩ / …
    <final> → ⟨t⟩ / …
    <⟨h⟩ digraph> → (⟨g⟩ / …) ⟨h⟩
                 or ⟨gh⟩ / …
    <complex nucleus> → ⟨ei⟩ / ⟨ay⟩ / …

I’ve typographically separated grapheme classes <> from graphemes ⟨⟩ and allographs ‹›, which is more specific than usually necessary and thus more confusing than helpful.Received on Tue Sep 27 2016 - 15:48:19 CDT

This archive was generated by hypermail 2.2.0 : Tue Sep 27 2016 - 15:48:21 CDT