Re: Language Tagging And Unicode

From: Peter Constable (peter_constable@sil.org)
Date: Thu Jan 20 2000 - 13:17:18 EST


>But I really don't see any difference from programmers (or
       even somebodys who typeset text) perspective -- Cyrillic and
       Latin "shapes" (glyphs) are the main visual difference but the
       principle of writing, printing etc. is completely the same in
       Cyrillic and Latin (compared to Arabic, Hebrew or anything). If
       anybody knows any other difference than what can be now called
       "the shape of glyphs" he is welcome to add to this discussion!

       Now it sounds like you're arguing against your original
       proposal to add additional characters for Serbian.

>I'd more precisley say: Serbian is written in TWO scripts:
       Latin AND Cyrillic. Interchangeably! That's one small point
       that people forget. And it's not the only one language with
       that property, as far as I know.

       Of course. I was assuming a context of Serbian written with
       Cyrillic.

>The problem of calling Unicode "plain text" standard and
       refusing to care about Serbian Cyrillic based on that is that
       some languages represented by Unicode are completely printed AS
       THEY ARE WRITTEN (which means that printing IS considerably
       HARDER than printing Latin and Cyrillic).

       This is no different than several other scripts covered by
       Unicode without requiring the encoding of presentation forms.

>Because of that, Serbian Cyrillic seems to be in the danger to
       be the only European language which would have to be rendered
       only with these much more complex engines.

       If this statement is at all true, then "European" must be
       emphasised. I don't see at all why that's particularly
       significant. Furthermore, there are already issues that need to
       be addressed for other European languages: different languages
       have different requirements for ligature formation (e.g.
       Turkish dotless i), case mappings (e.g. Turkish dotless I),
       tokenisation into text elements (e.g. <ch> digraph in Spanish
       and (if I recall from 5 months ago) Czech). Serbian-Cyrillic is
       by no means the only writing system for which people have
       requested an addition "because otherwise it's the only European
       language that's not adequately supported" but for which no
       additions are in fact needed to provide adequate support. The
       point is that the missing support needs to come from somewhere
       else.

>That's why I considered adding a few characters in Unicode --
       only to match current practice. Everybody claims that is as
       such only because of the compatibility -- that's exactly why I
       analyzed idea of new characters -- to make Serbian Cyrillic
       enough compatible with all these programs/systems/engines which
       know how to print Latin and Cyrillic but do not know about
       diffrence in Chinese scripts.

       When discussing such compatibility, you need to look at
       existing encodings. If you can point us to an existing standard
       encoding that includes as distinct characters both Russian t
       and Serbian t, then you may find more people willing to
       consider the proposal.

>Even Unicode as standard does not expect that every
       Unicode-compatible program MUST be able to represent all
       scripts. Exactly because of that maybe Cyrillic letters (even
       if they are Serbian!) should be considered as something which
       SHOULD work on systems incapable to display different Chinese
       characters.

       Now why should I expect developers to consider support for
       Serbian mandatory but for Chinese or Amharic or Khmer or Lahu
       not necessary? I can just imagine Khmer or Thaana or Ethiopic
       advocates getting on this list saying for their writing systems
       just what you're demanding for Serbian. Developers still make
       their own decisions as to what parts of Unicode they'll
       support. I wholeheartedly agree with your desire to see the
       industry provide support for your writing system, just as I
       hope to see the industry support thousands of others. But I
       don't think you're going to find the answer you're looking for
       coming from Unicode.

       Hopefully, you're bringing it up in this list has been
       productive in making others aware of the need so that they know
       that they need to provide support for the other technologies
       that work together with Unicode to provide what you need. We've
       already heard from Chris Pratley that MS Office apps are moving
       in the direction of supporting language tag-based glyph
       substitution, though of course will take time, so I see that as
       an encouraging indicator that things are moving in the
       direction they need to go.

       Regards,
       Peter



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:58 EDT