Re: [OT] multilingual support in MS products (was Re: Kurdish ghayn)

From: Mark Davis (mark.davis@jtcsv.com)
Date: Sun Apr 27 2003 - 20:54:43 EDT

  • Next message: Adam Twardoch: "Re: [OT] multilingual support in MS products (was Re: Kurdish ghayn)"

    Wait just a second. The IJ digraph was added for compatibility with other standards, not necessarily
    because it is really needed for Dutch. Unicode does not, in general, encode graphemes, except for
    compatibility purposes; "ch" for Spanish and Slovak, for example, are not encoded.

    Given the mass of data in Dutch that already use "i" + "j" to encode that grapheme, adding the "ij"
    character will just confuse matters. When editing a mixture of such text, search/replace will not
    identify the two; users will sometimes have to hit one backspace to delete what appears to be two
    characters, sometimes hit two backspaces, etc. Bad idea.

    The only concrete thing I have heard is that when titlecasing Dutch, "i" + "j" at the start of a
    word should be titlecased as "I" + "J", not as "I" + "j". For that, one would request a change to
    SpecialCasing.txt in the Unicode Character Database for the next version of Unicode. Kent Karlson
    proposed this some time back; it may be time to revisit it, but we would need a proposal for the
    next UTC.

    Märk Davis
    ________
    mark.davis@jtcsv.com
    IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193
    (408) 256-3148
    fax: (408) 256-0799

    ----- Original Message -----
    From: "Thomas Milo" <t.milo@chello.nl>
    To: "John Hudson" <tiro@tiro.com>
    Cc: "Chris Pratley" <chrispr@exchange.microsoft.com>; <Bob_Hallissy@sil.org>; <unicore@unicode.org>;
    <unicode@unicode.org>; "Gerard Unger" <ungerard@wxs.nl>
    Sent: Sunday, April 27, 2003 12:18
    Subject: Re: [OT] multilingual support in MS products (was Re: Kurdish ghayn)

    > Hi John,
    >
    > At 02:49 AM 4/27/2003, Thomas Milo wrote:
    >
    > > >Would it be possible to make the IJ/ij available at last as a single
    > > >character IJ/ij for Dutch users? MS Office seems to be unaware of this
    > > >character (apart from correct shifting between upper and lower case). A
    > > >spell check of IJstijd (correct Unicode) vs. IJstijd (improvised ASCII)
    > > >approves of the - erroneous! - ASCII form and does not even recognize
    > the
    > > >horrendous misspelling Ijstijd.
    > > >
    > > >A web search of the Dutch word IJstijd (Ice Age) indicates that the use
    > > >of this essential character is still practically zero.
    > >
    > > Whenever I've asked Dutch colleagues (type designers and typographers)
    > > about the IJ/ij characters they've always expressed amazement that these
    > > characters exist and most reject the need for them. 'Just use I and J'
    > > seems to be the usual response. Tom is the only Dutch colleague I've ever
    > > heard express support for the use of these characters. It is true that
    > > there are special rules for how the letters I and J in combination should
    > > be typeset in Dutch, but the same is true of lots of digraphs in German
    > and
    > > other languages that are not encoded as distinct characters and will not
    > > be. I'm far from convinced that the IJ/ij characters are necessary or that
    > > their use should be encouraged.
    >
    > No Dutchman - whether he is involved in type or not - can be amazed by the
    > existence of IJ. If his name happened to begin with IJ, he would not be able
    > to look up his own name in a telephone directory. With no exception IJ is
    > taught in all schools as part of our handwriting as a ligature - just
    > checked with my daughter. I called Gerard Unger about it and he pointed out
    > that IJ is surrounded by a certain ambivalence: dictionaries list it either
    > with I or with Y. The latter is enough to grant it graphemic status. And -
    > like anybody would - he agrees that it capitalizes as one letter. As for
    > your typographer friends, they mean: just compose it out of I and J (still
    > in the streets of the Netherlands one frequently observes Ü with the left
    > leg broken: the ligature of I and J). But this is all talk about glyphs.
    > Unicode deals with graphemes, and there IJ is already recognized as such.
    >
    > IJ as a character is part and parcel of Dutch orthography and included in
    > the Unicode Standard at the request of the Netherlands Standardisation
    > Committee. There is no need to ask approval to use IJ/ij - the only point I
    > am making is, that we still don't have a convenient way of entering it.
    >
    > Graphemically the use of IJ involves no complex rules at all. In Dutch ALL
    > combinations of letters I and J - with extremely rare exceptions in foreign
    > words like "bijoux", consequently corrupted into byoux by weak spellers -
    > are instances of the single grapheme IJ. As a result, the common hack to
    > type capital IJ with two upper case characters causes problems with spellers
    > and grammar checkers. Moreover it leads to spelling and sorting errors (in
    > some dictionaries and all telephone directories IJ mix with Y, but the hack
    > moves it to I); automatic capitalization produces a revolting Ij, in rotated
    > text IJ come apart, etc. etc.
    >
    > There is no need to put up with this hack: the Unicode Standard provides the
    > correct solution and the industry obliged itself to implement it.
    >
    > t
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Sun Apr 27 2003 - 21:30:53 EDT