Re: (TC304WG4.50) Charset vs. codeset

From: Tex Texin (texin@bedford.progress.COM)
Date: Mon Oct 06 1997 - 03:00:33 EDT

OK, I got paid this week so I can afford to throw in my $.02:

With respect to uniqueness of Unicode, my first thought was of the
compatibility characters, since these are redundant with the characters
that have more specific semantics. For example, the hyphen exists for
compatibility with ASCII, but then also has other (more specific)

However, my second thought is that uniqueness, nice as it is when you are
doing mathematics, I think is not so significant for us, since uniqueness
of the characters depends on your application. When I am doing a
case-insensitive search, even ascii is non-unique.
When I work with asian characters I fold half-width and full-width
together. In other applications I treat both of these subsets as unique.
I suspect therefore that categorizing a character repertoire as
consisting of unique sets will depend on the eye of the beholder.
(Or the "i" as Alain beholds in his examples!)

Since I can't imagine someone creating a character set with more than one
of a character, without some differentiating characteristic between them,
I don't see that it helps us to debate which sets are unique or to
include uniqueness in the definition.

I thought Ken's definitions were adequate.


On Oct 5, 8:35am, Alain LaBont\i - SCT wrote:
> Subject: Re: (TC304WG4.50) Charset vs. codeset
> A 06:42 97-10-05 -0700, Martin J. Dürst a écrit :
> >On Sat, 4 Oct 1997, Keld J|rn Simonsen wrote:
> >
> >> I had a few comments to Kenneth Whistler's recent writing:
> [Martin] :
> >> > Unicode is an encoded character set.
> [Keld] :
> >> I am not so sure about that. It violates the general principles of
> >> that an encoded character set only encodes one (abstract) character
> >> in one way.
> >>
> >> > ISO/IEC 10646 is an encoded character set.
> >>
> >> True.
> [Martin] :
> >You are probably refering to cases like A + combining ring above
> >vs. A with ring above (sorry I don't remember the official names).
> >
> >In that sense, both Unicode and ISO/IEC 10646 are very much the
> >same. Both include the possibilities to use combining marks.
> >Unicode is a little bit more explicit about them. But it doesn't
> >allow more things that ISO/IEC 10646. ISO/IEC doesn't explicitly
> >define equivalences, and therefore in theory, it's possible to
> >say that these are different abstract characters (or combinations
> >of them). But Unicode can say the same, namely that they are
> >different abstract characters/combinations. That the difference
> >shouldn't be visible to the user is patently obvious in both cases.
> [Alain] :
> My 2 cents:
> On one hand some combinations where you would not see a difference even
> with bad implementations are not recognized as equivalent in UNICODE
> point which typically affect French; with the I other languages are
> affected as well).
> On the other hand, if the implementation is done on the fly by
> or overdisplaying, the difference will be visible with the COMBINING
> DIACRITICS used with a SMALL DOTTED I (a traditional i!) while
according to
> UNICODE there is no difference of interpretation between the two
> This is of course only anecdotical. However that should imho be
> in UNICODE. But nobody cares except me, it seems.
> I would like the two following rules to be true (wish list) :
> 1. Within a given script, combinations which make no difference with a
> precomposed character should be considered equivalent in UNICODE.
> 2. It should be disallowed to show differences for UNICODE
> when only one font is used.
> Personally, I also have problem buying applications that do double
> encoding, as this (as we all know with QP and SGML entities) multiplies
> possibilities of bugs, but also of inconsistencies (in particular in
> engines). I like that all passes through the same coding/decoding
> at the lowest possible level (complete application environment or even
> operating system level).
> Alain LaBonté
> Québec
>-- End of excerpt from Alain LaBont\i - SCT

Tex Texin                    
Manager International Development and Product Management
Progress Software Corp.        Voice:   +1-781-280-4271
14 Oak Park                      Fax:   +1-781-280-4949
Bedford, MA 01730  USA 

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:37 EDT