RE: Linguistics and Unicode

From: Don Osborn (
Date: Wed Dec 20 2006 - 09:05:11 CST

  • Next message: Addison Phillips: "Re: Question about new locale language tags"

    Luke, IMO there are a few issues involved here. Mike Maxwell's comments are
    also relevant (I append those for reference since this seems to begin a new

    First, was the main reason your friend stuck with the old product
    convenience or a perceived problem with Unicode? If the former, I have
    encountered similar attitudes. And I don't see that as a problem so long as
    they are not encouraging other colleagues and students who don't know better
    to use the same solution. On the whole Mike is probably correct that this is
    ever less a problem in academia. WRT the field and locations outside of
    relatively technology-privileged (and Anglophone) environments, see below.

    Second, if they perceive a problem with Unicode and don't air it, that
    penalizes everyone - if a misunderstanding it is not cleared up; if a real
    problem it is not addressed. Non-conflict is not assent. I'm aware for
    instance of a continued current of concern re dynamic composition among some
    Francophone researchers who have a lot of field experience, but haven't had
    time to dialogue with them to better understand better their POV.

    Third, the issue of "not knowing" about Unicode (which we may extend to
    include having heard about it but not much more) is of concern on another
    level. I have just learned of an effort in Togo to create a font for Ewť and
    other languages of the country. Still seeking info on that, but if indeed
    this is a new 8-bit font, it is part of a larger problem relating to missing
    entirely the existence and point of Unicode (to create such a font, one
    would probably be working on a computer system with one or another "Unicode
    font" already installed that includes the characters one seeks to use).

    In the longer run, Unicode, by force of logic and decisions of industry,
    will continue to expand in use. But is that sufficient? Are the cases
    indicated above just isolated anomalies or do they indicate a need for more
    proactive international PR and extension (which involves listening & taking
    account of local issues as well as giving info & training) by Unicode? If
    the latter, then what approach - a "diva"? or someone to strategize and
    organize "roadshows"? or? (Money, of course is needed, but let's talk of the
    ideal approaches - beginning with whether they are indeed needed.)

    Don Osborn
    PanAfrican Localisation project

    From: [] On
    Behalf Of Luke Onslow
    Sent: Wednesday, December 20, 2006 5:53 AM
    Subject: Linguistics and Unicode

    Dear all,

    I am sure there are some linguistic scholar on board of this mailing list.
    Do you currently see any limitations of the current version of†Unicode apart
    from the fact that there are still some writing systems that haven't been
    ported to Unicode. Well to be PC, I mean non-official writing systems and
    obsolete writing systems.

    I talked with a linguist friend from Germany once and he was absolutely
    unaware of Unicode and was sticking to the good old product he was using.
    Didn't give the name of the product though? Anyone knows?



    > -----Original Message-----
    > From: [] On
    > Behalf Of Michael Maxwell
    > Sent: Tuesday, December 19, 2006 4:37 PM
    > To:
    > Cc: Michael Maxwell
    > Subject: RE: Unicode or specific language charset
    > > 1) Some people working with diverse languages (thinking here of some
    > > academic linguists) who have found comfortable solutions in the past
    > > involving non-Unicode fonts may be reluctant to change. These are
    > > probably fewer by the day, and I imagine that anyone who has been
    > > exchanging text widely in languages with extended Latin or non-Latin
    > > characters will have seen the advantage of working in Unicode.
    > I used to be one of those persons, when I worked on minority languages
    > in Colombia. I would say the situation was (and maybe still is) more
    > common with field linguists (working in minority languages) than it is
    > with academics in general.
    > Still, there is considerable impetus towards using Unicode in field
    > linguistics--an increasing number of tools for field linguists are
    > available in Unicode versions, the IPA has virtually all the characters
    > one would need when recording data phonetically or phonemically,
    > sufficient characters are available in Unicode for practical
    > orthographies(!), major organizations that deal with minority/
    > previously unwritten languages are encouraging or even mandating the
    > use of Unicode, etc.
    > I think the only real issue for field linguists is that in some areas
    > with complex orthographies, the fonts to implement those Unicode
    > characters might be too language-specific. I can imagine that someone
    > working with a minority language in India might find that standard
    > Devanagari (etc.) fonts might not behave they way they need.
    > I don't have any real examples of that, but I can say that the font/
    > rendering support of Unicode for Yoruba (which of course has been
    > written for over a century) was lacking. Specifically, the combination
    > of a dot under a vowel ('e' or 'o') plus a tone mark (grave or acute
    > accent) does not look "pretty". You can see examples at
    > When I look at this page
    > on a Windows XP machine, the tone marks over the plain vowels are
    > "correctly" placed (presumably built-in glyphs in the font), whereas
    > the tone marks over the dotted lower-case vowels are much too high;
    > while either the tone marks are too far to the right over the dotted
    > upper-case vowels, or else the dot is too far to the right under the
    > accented upper-case vowels (depending on which is composed first and
    > therefore uses a built-in glyph, I suspect). (Mid-tone marks are not
    > usually written, but in the wikipedia page you can see a few of these,
    > and they have the same problems a!
    > s the acute or grave accents on the dotted vowels, and also over the
    > 'n' or engma.)
    > While the font issues I'm describing are not the fault of Unicode, this
    > is not obvious to the casual user--and the distinction may not matter
    > to the user in any case. Such a user might very well turn to a
    > proprietary font/ encoding for displaying Yoruba or some other language
    > with similar issues. And as you may know, those proprietary fonts/
    > encodings are all too common among the Indic languages...
    > Mike Maxwell
    > CASL/ U MD
    > Mike Maxwell
    > CASL/ U Md

    This archive was generated by hypermail 2.1.5 : Wed Dec 20 2006 - 09:12:09 CST