Re: A basic question on encoding Latin characters

From: schererm@us.ibm.com
Date: Thu Sep 23 1999 - 13:40:45 EDT


well, let me try this.

1. T
2. T
3. T
4. i don't know what all is in mes-3. no answer.
5. F :-)

if i understand this right, then this is quite easy:

for as long as you have a base character and all necessary accent marks in
unicode, modern software can both deal with it for sorting, searching, etc., and
can display it in the way the user can expect. that is, precomposed characters
are entirely unnecessary.

unicode has some of them because they were included in older codepages, and
because older software that was hardcoded for those codepages and their
assumptions could not handle composing sequences. unicode was supposed to not
have them in the beginning to save many code points and ambiguities, and then
political pressures (acceptance/adoption) forced them in.

but this is already more than what an intelligent layperson would want to
know...

markus

Markus Scherer IBM Cupertino, CA +1 408 777 5860 Fax ..5891 schererm@us.ibm.com

Marion Gunn <mgunn@egt.ie> on 99-09-23 09:49:33

To: Unicode List <unicode@unicode.org>
cc:
Subject: A basic question on encoding Latin characters

If anyone can actually understand the question I plan to set out in
stages below, would they please do me the great favour of pointing me to
a URL/scholarly paper containing its answer in the fewest, simplest
number of words, and employing the expression

ôindustrial
implementation


ö at least once.

The question is about Unicode 3.0/10646 (delete as appropriate) and I


Æd
be grateful if experts, if they do not know of the existence of such a
useful URL as I have outlined, would simply respond True/False to parts
1-4 of the question, if they consider that enough to satisfy.

1. I have heard that it argued that there is no reason to encode in the
UCS or in 10646, _any_ new precomposed Latin combinations as
single-entity characters. T/F.

2. I have heard that that is because UCS/10646 is a coded character set,
rather than a checklist of actual end-user letters (to use the
layperson


Æs normal understanding of the word


ôletter


ö). T/F.

3. I have heard that most end-users, present company excepted,:-) have
neither need nor desire to know how such things are coded in the
UCS/10646, once it can represent the (layperson


Æs) letters needed. T/F.

4. I have heard that CEN


Æs MES-3, as distinct from related inferior
subdivisions, contains all the combining characters needed to satisfy
all of the Latin needs of the layperson to whom statements 2 and 3 above
apply. T/F.

5. I have heard that not one of these three mailing lists (Alpha,
Unicode, 10646) have experts capable of creating such a paper/URL as
would explain 1-3 and at the same time dovetail neatly into 4 in such a
way as to satisfy the intelligent layperson who does not want to drown
in too much of the technical detail, but only learn enough to be able,
on the basis of answers 1-4, to judge for himself/herself how
comprehensively UCS/10646/MES-3 meets Latin requirements. T/F.:-)

With best wishes,
Marion Gunn



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:53 EDT