Re: "Universal Character Set"

From: Mark Davis (mark.davis@icu-project.org)
Date: Sat Feb 17 2007 - 16:55:28 CST

  • Next message: Frank Ellermann: "Re: BOCU-1 spec"

    At least for English speakers, I've found a strong anecdotal correlation
    between those who say UCS or ISO 10646 and those who say "octet" instead of
    byte.

    *73,600,000* for *byte
    **7,650,000* for *octet*

    As with your case, the problem is separating out the non-computer usage.

    *29,300,000* for
    *byte<http://www.google.com/url?sa=X&oi=dict&ei=nofXRai5HKL-gwOqj_yuBg&sig2=DP7OI4TVeGG7Sv7L863Mgg&q=http://www.answers.com/byte%26r%3D67&usg=__Zr60P2bC7-JeyvzYT5eTNnpbvYc
    computer<http://www.google.com/url?sa=X&oi=dict&ei=nofXRai5HKL-gwOqj_yuBg&sig2=-Sc2pvcV3Je56nYsjxp6YQ&q=http://www.answers.com/computer%26r%3D67&usg=__M4B4zoi0OLkSJhbH_1uO898OW1Q
    **1,030,000* for
    *octet<http://www.google.com/url?sa=X&oi=dict&ei=h4fXReHrDIH6hAOy7MioBg&sig2=Mlwi_cM72RfCCAlrUXt5Kg&q=http://www.answers.com/octet%26r%3D67&usg=__Dn_wZA2t1N5tn7GnbyiN15nQ9r8
    computer<http://www.google.com/url?sa=X&oi=dict&ei=h4fXReHrDIH6hAOy7MioBg&sig2=cTziEI6YKT02zxLGMGuQ1g&q=http://www.answers.com/computer%26r%3D67&usg=__Dkr_BRAT8x-v_GHXL84yR22NjK8
    *
    Mark
    *<http://www.google.com/url?sa=X&oi=dict&ei=nofXRai5HKL-gwOqj_yuBg&sig2=-Sc2pvcV3Je56nYsjxp6YQ&q=http://www.answers.com/computer%26r%3D67&usg=__M4B4zoi0OLkSJhbH_1uO898OW1Q
    *
    On 2/17/07, Asmus Freytag <asmusf@ix.netcom.com> wrote:
    >
    > On 2/17/2007 9:58 AM, Don Osborn wrote:
    > >
    > > Does anyone currently use the term "Universal Character Set" (UCS) to
    > > refer to Unicode/ISO-10646? I guess it is technically correct, but I
    > > rarely see it. It seems that folks generally use "Unicode" as the
    > > catch-all term, or maybe I'm missing a wider use of UCS?
    > >
    > I believe your observation about "Unicode" being the common label are to
    > the point. A bit of research is illuminating and might explain some of
    > the reasons why the term has caught on.
    >
    > There are about 33 million pages indexed on Google that can be retrieved
    > by a search for "Unicode" and about 111,000 by a search for "Universal
    > character set". If you subtract all pages that mention 10646 or Unicode
    > or UCS that number drops to 1/10th fir the altter. If you similarly
    > subtract the other terms from the search for Unicode, there's hardly a
    > reduction in number.
    >
    > What that means is that "universal character set" is probably most often
    > used as a descriptor, as in "Unicode is a universal character set", and
    > not as a label. The common label is clearly "Unicode". That's not
    > surprising, because Unicode as a label has the advantage of being
    > shorter and clearly referring to a specific character set.
    >
    > In the case of UCS as a label, you run into the problem that the letters
    > UCS are not unique. Google will pull up the Union of Concerned
    > Scientists, UCS Inc., University College School and a number of others
    > on the first screen (and also helpfully suggest that you really meant
    > USC). Trading non-distinctiveness for brevity is apparently not a clear
    > win - and the use of UCS (in all meanings) is barely 1/6th of the one
    > for Unicode. If you search for UCS together with 10646 or Unicode to
    > sift out when UCS might have been used in the context of character sets,
    > you find only about 800K inks, which only emphasizes the issue with the
    > multiple meanings of UCS.
    >
    > 10646 by itself gives about 4.5 million hits, of which fully 1/3 don't
    > mention ISO, but are in reference to part numbers or are otherwise false
    > positives--based on that you can conclude that 10646 is used as a
    > designator of the character set about 1/10th as often as Unicode.
    >
    > There are instances where referring to Unicode is the only correct
    > choice. For example, when referring to Unicode Normalization Forms,
    > Unicode Bidi Algorithm, Unicode Line Breaking, and the myriad other
    > specifications that have been developed or are being developed around
    > the character set and collection of character properties by the Unicode
    > Consortium.
    >
    > A./
    >
    >
    >

    -- 
    Mark
    


    This archive was generated by hypermail 2.1.5 : Sat Feb 17 2007 - 16:57:21 CST