Re: Characters

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Fri Feb 11 2011 - 13:53:23 CST

  • Next message: anbu@peoplestring.com: "Re: Characters"

    On 2/11/2011 8:11 AM, anbu@peoplestring.com wrote:
    > No, this is not a joke. Whenever I post something, you are making fun of
    > it. What's the problem? I seriously want to know the characters present in
    > Unicode 6 and each of their frequencies of usage.

    Nobody has a problem with your desire to know these characters.

    However, this mailing list s not a place to leave requests, demand
    results, or place an order.

    So when you write: "please provide me..." which comes across as placing
    an order, instead of a making polite request to help you in your own
    research, as in "Does anybody know where I can find out..." then people
    find that a bit odd. And when your demand includes something that can,
    at best, be answered by a full-blown research project, then "please
    provide me.." becomes funny.

    On the substance of your question, you've been pointed to a place where
    you can find out which characters are included in Unicode 6.0. And, I
    believe, people have explained to you why there isn't an equally
    definite, publicly accessible list of character frequencies. On that
    score, your question is simply not well-defined, because character
    frequency depends on context. In general, the frequency is somewhat
    different for each document.

    Some characters tend to be rare, no matter what the document is. Some
    characters tend to be frequent inside a document, assuming that they are
    used at all. Rare Chinese characters would be an example of the first
    case. Even a document that contains that character, would contain many
    other Chinese characters, many with higher frequencies. A character from
    a rare alphabet would be an example of the other case. Any document
    written in that alphabet would contain many instances of that character,
    so within that document, their frequency would be high.

    Global character frequencies are useful only for compression schemes
    that are used to compress an "average" set of documents. For example,
    they might be useful to compress an index of the entire web. Per
    document schemes, or adaptive compression in general, would do better on
    selected documents.

    A./
    > On Fri, 11 Feb 2011 09:03:20 -0700, "Doug Ewell"<doug@ewellic.org> wrote:
    >> I assume this is a joke.
    >>
    >> --
    >> Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org
    >> RFC 5645, 4645, UTN #14 | ietf-languages @ is dot gd slash 2kf0s ­
    >



    This archive was generated by hypermail 2.1.5 : Fri Feb 11 2011 - 13:54:27 CST