Re: Characters

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Fri Feb 11 2011 - 13:53:23 CST

Next message: anbu@peoplestring.com: "Re: Characters"

Previous message: William_J_G Overington: "RE: Characters"
In reply to: anbu@peoplestring.com: "RE: Characters"
Next in thread: William_J_G Overington: "Re: Characters"
Reply: William_J_G Overington: "Re: Characters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On 2/11/2011 8:11 AM, anbu@peoplestring.com wrote:
> No, this is not a joke. Whenever I post something, you are making fun of
> it. What's the problem? I seriously want to know the characters present in
> Unicode 6 and each of their frequencies of usage.

Nobody has a problem with your desire to know these characters.

However, this mailing list s not a place to leave requests, demand
results, or place an order.

So when you write: "please provide me..." which comes across as placing
an order, instead of a making polite request to help you in your own
research, as in "Does anybody know where I can find out..." then people
find that a bit odd. And when your demand includes something that can,
at best, be answered by a full-blown research project, then "please
provide me.." becomes funny.

On the substance of your question, you've been pointed to a place where
you can find out which characters are included in Unicode 6.0. And, I
believe, people have explained to you why there isn't an equally
definite, publicly accessible list of character frequencies. On that
score, your question is simply not well-defined, because character
frequency depends on context. In general, the frequency is somewhat
different for each document.

Some characters tend to be rare, no matter what the document is. Some
characters tend to be frequent inside a document, assuming that they are
used at all. Rare Chinese characters would be an example of the first
case. Even a document that contains that character, would contain many
other Chinese characters, many with higher frequencies. A character from
a rare alphabet would be an example of the other case. Any document
written in that alphabet would contain many instances of that character,
so within that document, their frequency would be high.

Global character frequencies are useful only for compression schemes
that are used to compress an "average" set of documents. For example,
they might be useful to compress an index of the entire web. Per
document schemes, or adaptive compression in general, would do better on
selected documents.

A./
> On Fri, 11 Feb 2011 09:03:20 -0700, "Doug Ewell"<doug@ewellic.org> wrote:
>> I assume this is a joke.
>>
>> --
>> Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org
>> RFC 5645, 4645, UTN #14 | ietf-languages @ is dot gd slash 2kf0s 
>

Next message: anbu@peoplestring.com: "Re: Characters"
Previous message: William_J_G Overington: "RE: Characters"
In reply to: anbu@peoplestring.com: "RE: Characters"
Next in thread: William_J_G Overington: "Re: Characters"
Reply: William_J_G Overington: "Re: Characters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Feb 11 2011 - 13:54:27 CST