RE: Diacritical marks: Single character or combined character?

From: Naz Gassiep <mrnaz_at_hotmail.com>
Date: Fri, 6 Dec 2013 09:10:12 +1100

Hi, does anyone have any answers to this question?

From: mrnaz_at_hotmail.com
To: unicode_at_unicode.org
Subject: Diacritical marks: Single character or combined character?
Date: Fri, 8 Nov 2013 18:37:29 +1100

Hi all,
I would like to know if there is a best practice or recommendation as to which method to use when representing letters with diacritical marks. For example, take the following two characters:
ā

They may look the same, however the first is a single character U+0101, while the second is a combination of two, the first being regular a (U+0061) and the second being the combining macron (U+0304).

In producing content, which is the better to use? When writing in languages such as Turkish, there are a limited finite set of diacritical marks, all of which are represented in the Unicode character set.

However, when writing statistical formulae, every symbol used, including both Latin and Greek characters, can have a circumflex or overline added to it to denote a particular meaning. In that case, I found myself using the relevant character combined with U+0302 or U+0305 as needed.

Now that I am switching between the two activities (writing stats stuff and publishing transliterated content), I find myself unsure as to what the best method is, if one is better than the other.

I favour using a single method for all things, and so I am attracted to the idea of using combining characters for everything. However, language parsing tools for languages where those combined characters are used may be fooled when presented with U+0061 combined with U+0304 instead of the usual U+0101.

Any advice or guidance on this issue would be greatly appreciated.
                                                                                              
Received on Thu Dec 05 2013 - 16:13:24 CST

This archive was generated by hypermail 2.2.0 : Thu Dec 05 2013 - 16:13:25 CST