Unicode Case Mappings UTR #21

From: Carl W. Brown (cbrown@xnetinc.com)
Date: Wed Nov 29 2000 - 01:53:37 EST

I have found some problems trying to implement case mapping. I am making
some assumptions and have some questions.

#1 It is unclear other than Turkish which languages use the dotless I. I
assume they are:

Turkish, Azeri, Tatar, and Bashkir.

#2 What are the rules for Title case and spacing? I assume that a
non-breaking space is a joiner and does not indicate that the following
alpha character is a title case character. Also that the zero width
non-breaking space (BOM) is neutral.

#3 French also has other articles such as d' are there prescribed rules for
capitalization? Are there other languages to consider?

#4 There is no mention of stop words.


This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:15 EDT