RE: Why people still want to encode precomposed letters

From: Erkki I. Kolehmainen (
Date: Wed Nov 19 2008 - 03:27:00 CST

  • Next message: Andrew West: "Re: Why people still want to encode precomposed letters"

    In the new Finnish multilingual keyboard layout standard (SFS 5966), almost all of the decomposable precomposed characters are entered as combinations (but encoded as precomposed). This way we can support the full repertoire of the Latin letters of all the European Union official languages (i.e., excluding Bulgarian and Greek) plus all indigenous languages spoken in Northern Europe (including the regional and minority languages). In addition, we also enter the non-decomposable letters with stroke utilizing the same method, i.e., we treat the stroke as a combining diacritical mark in the input phase. In this way, the keytops are not overly crowded, and the "foreign" characters (to us, that is), e.g. the German ß, the Danish/Norwegian æ and ø, the Icelandic ð and þ, or the Sámi ŋ can have been placed in intuitively recognizable positions.
    Erkki I. Kolehmainen

    Tilkankatu 12 A 3, FI-00300 Helsinki, Finland

    Puh. (09) 4368 2643, 0400 825 943; Tel. +358 9 4368 2643, +358 400 825 943


    -----Alkuperäinen viesti-----
    Lähettäjä: [] Puolesta Andrew Cunningham
    Lähetetty: 19. marraskuuta 2008 1:41
    Vastaanottaja: Unicode Mailing List
    Aihe: Re: Why people still want to encode precomposed letters

    Actually, insisting on precomposed characters may not make things ea sier for some languages. Just thinking of the practicalities involved. Take Vietnamese as an example, each combination of vowel and tone mark exists as a single precomposed character in Unicode.

    Then look at Microsoft's keyboard layout for Vietnamese. Due to the design parameters of keyboard layouts on Windows, Microsoft used combining diacritics for tone marks.

    Currently Yoruba doesn't have all its letters available as precomposed characters. But if it did, and you created a keyboard layout for it using MSKLC on Windows, you would end up using some combining diacritics for tone marking as well.

    The need for combining diacritics will not go away, and for some langauges, the existence of precomposed characters (if accepted into Unicode) will not make an practical difference in soem environments. It will amke a difference for some, but for diacritic heavy languages that can use more than one diacritic on a base character at a time, there are other issues that may make use of precomposed forms unlikely in all instances.

    Andrew Cunningham
    Vicnet Research and Development Coordinator
    State Library of Victoria

    This archive was generated by hypermail 2.1.5 : Wed Nov 19 2008 - 03:31:19 CST