Short_Unicode_names

From: 10646er@sesame.demon.co.uk
Date: Wed Jan 07 1998 - 04:57:12 EST


In message <9801062325.AA00487@unicode.org> Sairus P. Patel writes
Re: short Unicode names? via unicode@unicode.org:

> John Clews' algorithm seems to be the sort of thing you're looking for,
> though it might break down with some of Unicode's longer names. Just for
> fun, I tried it on U+FBF9, "ARABIC LIGATURE UIGHUR KIRGHIZ YEH WITH HAMZA
> ABOVE WITH ALEF MAKSURA ISOLATED FORM", and it produced:
>
> Ar_liga_uigh_kirg_yeh_hamz-a_alef_isol_form
>
> which is 43 characters long, 11 over your specified limit! (I do believe
> John implied that this algorithm has been used only with Latin and Cyrillic
> characters.) If the underlines were removed, as suggested in the algorithm,
> when space is at a premium, a 35-character word is produced:
>
> Arligauighkirgyehhamz-aalefisolform

Probably in ISO/TC46/SC2 we won't require presentation forms, but for
especially long names, or in some UCS collections which include
these, it may be useful to have an additional rule, or a lookup table
to cope with exceptions.

Where language names like Uighur and Kirghiz are involved, I have
also been using 2-letter language codes from ISO 639.

In the example you give, you probably intended to include the MAKSURA
element too - but the name is rather long already!

Best wishes

John Clews

--
John Clews (Chair of ISO/TC46/SC2: Conversion of Written Languages)

SESAME Computer Projects, 8 Avenue Road, Harrogate, HG2 7PG, England Email: 10646er@sesame.demon.co.uk; tel: +44 (0) 1423 888 432



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:38 EDT