Re: "Universal Character Set"

From: Mark Davis (mark.davis@icu-project.org)
Date: Sat Feb 17 2007 - 16:55:28 CST

Next message: Frank Ellermann: "Re: BOCU-1 spec"

Previous message: Jon Hanna: "Re: "Universal Character Set""
In reply to: Asmus Freytag: "Re: "Universal Character Set""
Next in thread: John Hudson: "Re: "Universal Character Set""
Reply: John Hudson: "Re: "Universal Character Set""
Reply: Asmus Freytag: "Re: "Universal Character Set""
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

At least for English speakers, I've found a strong anecdotal correlation
between those who say UCS or ISO 10646 and those who say "octet" instead of
byte.

*73,600,000* for *byte
**7,650,000* for *octet*

As with your case, the problem is separating out the non-computer usage.

*29,300,000* for
*byte<http://www.google.com/url?sa=X&oi=dict&ei=nofXRai5HKL-gwOqj_yuBg&sig2=DP7OI4TVeGG7Sv7L863Mgg&q=http://www.answers.com/byte%26r%3D67&usg=__Zr60P2bC7-JeyvzYT5eTNnpbvYc
computer<http://www.google.com/url?sa=X&oi=dict&ei=nofXRai5HKL-gwOqj_yuBg&sig2=-Sc2pvcV3Je56nYsjxp6YQ&q=http://www.answers.com/computer%26r%3D67&usg=__M4B4zoi0OLkSJhbH_1uO898OW1Q
**1,030,000* for
*octet<http://www.google.com/url?sa=X&oi=dict&ei=h4fXReHrDIH6hAOy7MioBg&sig2=Mlwi_cM72RfCCAlrUXt5Kg&q=http://www.answers.com/octet%26r%3D67&usg=__Dn_wZA2t1N5tn7GnbyiN15nQ9r8
computer<http://www.google.com/url?sa=X&oi=dict&ei=h4fXReHrDIH6hAOy7MioBg&sig2=cTziEI6YKT02zxLGMGuQ1g&q=http://www.answers.com/computer%26r%3D67&usg=__Dkr_BRAT8x-v_GHXL84yR22NjK8
*
Mark
*<http://www.google.com/url?sa=X&oi=dict&ei=nofXRai5HKL-gwOqj_yuBg&sig2=-Sc2pvcV3Je56nYsjxp6YQ&q=http://www.answers.com/computer%26r%3D67&usg=__M4B4zoi0OLkSJhbH_1uO898OW1Q
*
On 2/17/07, Asmus Freytag <asmusf@ix.netcom.com> wrote:
>
> On 2/17/2007 9:58 AM, Don Osborn wrote:
> >
> > Does anyone currently use the term "Universal Character Set" (UCS) to
> > refer to Unicode/ISO-10646? I guess it is technically correct, but I
> > rarely see it. It seems that folks generally use "Unicode" as the
> > catch-all term, or maybe I'm missing a wider use of UCS?
> >
> I believe your observation about "Unicode" being the common label are to
> the point. A bit of research is illuminating and might explain some of
> the reasons why the term has caught on.
>
> There are about 33 million pages indexed on Google that can be retrieved
> by a search for "Unicode" and about 111,000 by a search for "Universal
> character set". If you subtract all pages that mention 10646 or Unicode
> or UCS that number drops to 1/10th fir the altter. If you similarly
> subtract the other terms from the search for Unicode, there's hardly a
> reduction in number.
>
> What that means is that "universal character set" is probably most often
> used as a descriptor, as in "Unicode is a universal character set", and
> not as a label. The common label is clearly "Unicode". That's not
> surprising, because Unicode as a label has the advantage of being
> shorter and clearly referring to a specific character set.
>
> In the case of UCS as a label, you run into the problem that the letters
> UCS are not unique. Google will pull up the Union of Concerned
> Scientists, UCS Inc., University College School and a number of others
> on the first screen (and also helpfully suggest that you really meant
> USC). Trading non-distinctiveness for brevity is apparently not a clear
> win - and the use of UCS (in all meanings) is barely 1/6th of the one
> for Unicode. If you search for UCS together with 10646 or Unicode to
> sift out when UCS might have been used in the context of character sets,
> you find only about 800K inks, which only emphasizes the issue with the
> multiple meanings of UCS.
>
> 10646 by itself gives about 4.5 million hits, of which fully 1/3 don't
> mention ISO, but are in reference to part numbers or are otherwise false
> positives--based on that you can conclude that 10646 is used as a
> designator of the character set about 1/10th as often as Unicode.
>
> There are instances where referring to Unicode is the only correct
> choice. For example, when referring to Unicode Normalization Forms,
> Unicode Bidi Algorithm, Unicode Line Breaking, and the myriad other
> specifications that have been developed or are being developed around
> the character set and collection of character properties by the Unicode
> Consortium.
>
> A./
>
>
>

-- 
Mark

Next message: Frank Ellermann: "Re: BOCU-1 spec"
Previous message: Jon Hanna: "Re: "Universal Character Set""
In reply to: Asmus Freytag: "Re: "Universal Character Set""
Next in thread: John Hudson: "Re: "Universal Character Set""
Reply: John Hudson: "Re: "Universal Character Set""
Reply: Asmus Freytag: "Re: "Universal Character Set""
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sat Feb 17 2007 - 16:57:21 CST