Re: Classification of Alphabetic characters (was: Hiragana/Katakana sound marks)

From: Philippe Verdy (
Date: Thu Jun 05 2003 - 18:18:38 EDT

  • Next message: Mount, Rob (Robert F): "RE: Classification of U+30FC KATAKANA-HIRAGANA PROLONGED SOUND MA RK"

    From: "Mark Davis" <>
    > This is not an oversight. As I said, many characters are not
    > Alphabetic and are still part of words. Examples include that
    > character and many others. As a simple case, "can't" is a word in
    > English, although the apostrophe is not alphabetic. There are many,
    > many examples using combining marks, such as a virama (halant) in
    > Hindi, which is not Alphabetic:

    Another interesting case is the usage of the apostrophe in (modern) Breton, where the official alphabet considers the sequence <c'h> as a single letter, despite it's written with 3 Unicode characters, one of which is not a letter...

    I think that such sequence was used to allow using the same Latin characters as those used for French (and so to lower the publication cost), instead of using a specific glyph or rare Latin letter for the gutural R (similar to the <ch> in German, or <j> in Spanish), while also avoiding confusions with words imported from Roman or French or old Oil, and using the "ch" digram.

    I don't have information on how Breton (which appeared long before French) was written in the old Celtic ages, notably when it was spoken by artists throughout European kingdoms of the middle age. Was it using a Celtic alphabet ? Or often transliterated in multiple local scripts with a varying orthograph? The books I have only show the modern orthographs (there are several flavors depending on the local dialect, or the national origin of the scholar or publication that edit them, but they seem to agree to the use of the <c'h> trigraph even if its pronounced a bit differently by the 4 remaining dialects).

    This archive was generated by hypermail 2.1.5 : Thu Jun 05 2003 - 19:00:06 EDT