Re: Measuring a writing system "economy"/"accuracy"

From: Doug Ewell (
Date: Sat Jul 02 2005 - 22:15:35 CDT

  • Next message: Alexej Kryukov: "Greek curled beta in Unicode code chart"

    John D. Burger <john at mitre dot org> wrote:

    > > Apart from what measures might be used, the other question is surely
    > > 'What is being measured?' From
    > > your message, particularly the reference to IPA, I suspect that you
    > > are talking about phonetic economy and accuracy. This is one kind of
    > > economy/accuracy, but one could also measure at the semantic level,
    > > which case 'ideographic' writing systems would presumably be more
    > > economical.
    > One measure of this semantic efficiency might be the self-entropy of
    > the writing system. An intuitive way of thinking about this is to
    > imagine compressing a large sample of the language with, say, gzip. A
    > "less economic" language/orthography presumably has more redundancy,
    > and thus would compress more. The most efficient writing system
    > imaginable wouldn't compress at all.

    This is too ambitious, because even after you strip away the redundancy
    of the writing system, the language itself will still have an uneven
    distribution of phonemes. In other words, virtually any language is
    going to have comparatively common and rare sounds, and a writing system
    that assigns one symbol to each sound cannot be 100% efficient.

    For example, in English:

    * the consonant sound n is more common than ʒ
    * the vowel sound ɑ is more common than ʊ
    * the combination ts is more common than dg

    It would be safe to say that "the most efficient writing system
    imaginable would compress less than any other."

    I know constructed scripts aren't the most popular topic on this list,
    but this type of "efficiency" was one of my goals in creating a
    constructed script 25 years ago -- one symbol per sound, with as few
    strokes as possible per symbol. The stroke-efficiency is significantly
    compromised by certain other rules, but it was an interesting learning

    Doug Ewell
    Fullerton, California

    This archive was generated by hypermail 2.1.5 : Sat Jul 02 2005 - 22:18:30 CDT