Comparison algorithms in UNICODE

From: Patrik Faltstrom (paf@bunyip.com)
Date: Sat Aug 12 1995 - 15:22:22 EDT


Within the development of the distributed directory service
software DIGGER which uses Whois++ technology we will now
start the development of public domain software libraries
written in C which takes care of fundamental string functions
such as:

- Optimization of strings

  Some characters in the UNICODE table can be written by
  using a different base character which is then followed by
  one or more composition characters. An example is the
  character 00C5 LATIN CAPITAL LETTER A WITH RING ABOVE
  which can be written as 0041+030A. The idea behind this
  function is to minimize the number of bytes in the
  UNICODE string by converting all occurances of 0041+030A
  into 00C5.

- Uppercase/Lowercase conversions

- Comparison routines

- Conversion to/from FSS-UTF and UNICODE

We will start doing this because I have not seen any public domain
libraries that do this so far.

If I am wrong, and such software libraries exists, please inform
me about it.

Also, I suppose that this is the right forum to discuss eventual
implementational issues of UNICODE software? Or does it exist another
mailinglist for that?

   Regards, Patrik Fältström
   Bunyip Information Systems Inc
   Montreal, CANADA



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:32 EDT