Re: what is Latn?

From: Philippe VERDY (verdy_p@wanadoo.fr)
Date: Sat May 14 2005 - 13:58:55 CDT

  • Next message: Chris Jacobs: "Re: what is Latn?"

    "JFC (Jefsey) Morfin" <jefsey@jefsey.com> wrote to unicode@unicode.org:

    > Could someone tell me where to find the list of the characters
    > belonging to ISO 15924 "Latn" script? I wish to know if they support all
    > the French characters and variants?
    > Thank you.

    If you have correctly read the ISO15924 standard reference, you should have noted that its registry is hosted by the Unicode consortium, is physically managed by Michael Everson which acts with a specific delegation of the ISO15924 standard comity.
    In addition, the ISO15924 standard lists the equivalences with the legacy script codes used in the Unicode standard.

    So the ISO15924 "Latn" script code translates into the "Latin" script code for Unicode.

    Also the Unicode standard already publishes since long the list you want, in the derived properties files in the UCD.

    So For the Latin script, you have to include all characters assigned to the following script categories: Latin, Common and Inherited.

    If you are concerned about the differences of presentation between for example Latin and Cyrillic in some languages, you'll have to tune the glyphs used for the Common and Inherited characters.

    But even in that case, you'll need other distinctions than just the script, for example when making rendering distinctions within the same set of characters with strong script types (not Common and not Inherited) according to language... For example, what is the form to use for the combining cedilla? A standard cedilla mark below, or a hook on the top right?

    Then you'll have to make a choice about which form to make the default according to the languages you want to support: it may happen that none of the languages you support uses the classic cedilla below form, and so the top-right hook will be the default for some characters (but you should provide a way to support the classic cedilla for other "rare" languages that need it, and this requires searches about the effective alphabets used by various languages and their prefered form.

    Generally the default choice should be the one shown in the Unicode standard with representative glyphs (because it works with most languages using it), but it is not mandatory, as long as you respect the identity of the character (for example here the cedilla, whose presentation and position depends also of lettercase...).

    You may also know that accents and diacritics are not always drawn the way described in various sources. For example the French cedilla (used in French only under 'c' or 'C') looks like a small italic '5' digit attached without the top bar below the modified base letter (i.e. it goes down veritcally or obliquely to right, then it turns to the *right* to form a half loop that terminates lower to the left), and not like a simple hook that just starts down and turns to the left.

    As French does not use the cedilla for any other letter, the form of the cedilla for other base letters is not important, and there's no reason why you should adopt this French style for all other base letters.

    So a good renderer that supports diacritics should not simply position the same glyph relatively to the base letter or cluster they modifie, it should also be able to make distinctions of glyphs for the composite character effectively represented in Unicode by the combining sequence of abstract characters and their canonical equivalents.



    This archive was generated by hypermail 2.1.5 : Sat May 14 2005 - 13:59:55 CDT