Skip to main content

Zipf's Law

Zipf's Law



What if I told you that just by using a simple formula, I can calculate the number of times any word comes in this article, or in a book, or even across the entire internet…?
Zipf’s Law allows you to do exactly that with math that even a second grader can understand.

The law states that “Given some corpus of natural language utterances, the frequency of any word is inversely proportional to its rank in the frequency table.”
Now what this essentially means is any word which is the nth most common word will occur x times where

Formula

X= Number of times the most common word is used
                                             N

This extremely overpowered. This is mainly because in any language, English, German, French and even the ones which we have not been able to decipher yet follow a peculiar system.

The most used word is used almost two times more than the second most used word, three times more than the third most common word, four times as common as the fourth most common word and so on and so forth.

In the case of English, “The” happens to be the most common word, “of” comes in the second place, “and” in the third.


Now “the” accounts for about 7% of the total words.
So going by the above formula, “of” would account for about 7/2% = 3.5% of the total words and this actually holds true!

“And” accounts for about 7/3% =2.3% of the total words and this also comes out to be correct.

Numerically, “the” comes 69,971 times, “of” comes 36,411 times, “and” comes 28,852 times. These are out of a net total of about million words.


Go ahead do the math, verify it for yourself!
(All the above readings are from the Brown Corpus of American English text)


Now comes the really fun part,

 GRAPHS!


The graph below shows the number of occurrences of the 7 most popular words in the Brown Corpus.

Notice the curve!



Also for anybody who knows log, it gets even more interesting

The graph below is a log-log chart.

It depicts practically the same thing just with a log graph.




Moreover, this law does not only limit itself to languages, it is even true for
  • Page hits to Popularity rank of the web page
  • Population ranks of cities to City size
  •  Corporation Sizes to corporation ranking
  •  Income ranking to net income
  •  Popularity of TV channel to its TRP


It is like this law is everywhere as if it has been built into the human brains and psychology.

The law is among the most mysterious and there is no specific reason as to why this happens. There are a bunch of theories all over the internet but none do justice to the law and hence have not been included in this article.

This law does not end here, just as any other principle or law in math it is connected to many others and some of the ones are

1.       Benford’s LawPareto principle

All of those will be covered in the posts that follow. Zipf’s law does not end here…
PS- For the above content:
 “The” comes 48 times out of 527 words (approx. 9%)
  “Of” comes 24 times (Exactly Half!= 4.5%)
 “And” comes 15 times (approx. 1/3 = 3%)

While the sample size is very small the law still holds true.
Need I justify this law any further?


Enjoy your high school with - High School Pedia : www.highschoolpedia.com



Comments

Popular Posts

Piezoelectric Crystal

Piezoelectric Crystal {literally_pressing electricity} Ever wondered what it might be like to live in dark or sleep in a hot summer day without an AC ? Absurd, isn't it ? But with the rate at which we are using our resources, it may not be long before it actually happens. Scientist across the globe are searching for new resources, stocks and technology to prevent the near, unpleasant future of current and the next gen. One of these, an unexploited tech is that of Piezoelectricity. Piezoelectricity is an ability of some special materials (like crystals of quartz, ceramics of lithium niobite, gallium arsenide, zinc oxide) to trigger an electric charge when supposed to mechanical stress. These materials are available in basic instruments that we use in our daily life like microphones, quartz watch, gramophones. All these materials are run by the piezoelectric current. But how does it work? When you apply force (mechanical force) on it, an electric potential, on eit

The Inverse & Reciprocal TRIGONOMETRIC Functions

So, this is my second post on trigonometry. In this post we're gonna cover the reciprocal and the inverse Trigonometric functions. If you haven't seen my first post you should definitely view it as it covers the basics of Trigonometry The Reciprocal Trigonometric Functions The reciprocal Trigonometric function of Sine is Cosecant, of Cosine is Secant & for Tangent it is Cotangent. Cosecant (Csc θ = 1/Sin θ) or (Hypotenuse/Opposite) Secant (Sec θ = 1/Cos θ) or (Hypotenuse/Adjacent) Cotangent (Cot θ = 1/Tan θ) or (Adjacent/Opposite) We can also represent Tan θ in another way. As Tan θ = opposite/adjacent  & Sin θ = opposite/hypotenuse  & Cos θ = adjacent/hypotenuse ∴ Tan θ = Sin θ/Cos θ (The hypotenuses cancel out) As Cot θ = 1/Tan θ  So, we can also represent Cot θ as Cos θ/Sin θ.

High School Pedia

It is an initiative by some students to spread the light of knowledge to everyone and everywhere. It was started in the year 2015 and has grown rapidly in the past few months. By the means of this website, we try to provide information on every topic that we can reach up to. You can find different articles on this website. All these articles are written in simple language so that everyone can understand it and learn from it. We at High School Pedia believe in creative learning and this is the reason why we add our own edited graphical representations in every article. Once a very learned man said, “Knowledge increases by not keeping it to yourself but by sharing it with others”. And we follow the same motto “Share to Learn”. The team of High School Pedia tries its best to provide you with the best and original content. Unlike many other websites, High School Pedia is famous for its original and inspiring content.

Anode Ray Experiment

→Anode ray experiment was conducted by E Goldstein. →These rays are also known as canal rays. →This experiment helped in the discovery of the proton. Apparatus Used A discharge tube  was taken in which there were 2 electrodes i.e. Anode(+ve) and the cathode (-ve). The tube was filled with an inert gas. A perforated or porous cathode was used. A layer of zinc sulphide was placed at the back of the cathode. There was a vacuum pump in the tube. High voltage (5000v-10000v) was allowed to flow through the system. It was observed that when the gas was at 1atm(atmospheric pressure ) no change was seen in the tube.  When the   pressure   was decreased inside the tube, a glow could be seen at the back side of the cathode.

Levitation 2

LEVITATION II To be completely honest I was going to start this with a pun. I did think of one but it doesn’t float… I am sorry I just had to. Anyway, this is the second part to the article on super cool ways of making things levitate. Go check the first part out if you haven’t already. Actually, the first part may have become repulsive with all the magnets and stuff, but I promise this will be more attractive. Get it? No? I’ll stop now. I am just going to jump straight into it. 1.    Electrostatic Levitation I know you are probably sick and tired of magnets but they are the best way you know… This method is somewhat similar. You remember that cool science experiment you did with two straws attracting or repulsing each other based on their charge? So basically using the same principle we can make a charged object levitate. But before you try it, let me tell you it won’t be easy. Even impossible according to our Mr. Earnshaw. He even made a law (the law is

Isotopes, Isobars and Isotones

Isotopes These are elements which have the same atomic number but different atomic mass . They have the same atomic number because the number of protons that are inside their nuclei remains the same. But, they have different atomic mass because the number of neutrons that are also inside their nuclei is different. As the number of protons inside nuclei remains same, therefore the overall charge of the elements also remains same as in isotopes: no of protons = no of electrons . Hence, as isotopes overall charge remains neutral, therefore their chemical properties will also remain identical.   Therefore, Isotopes are chemically same but physically different.

Upcoming Marvel Movies

Hey, guys! Today we are going to talk about the 5 most awaited and upcoming marvel movies. Everyone is eagerly waiting right ?!            1) GUARDIANS OF THE GALAXY vol.2 (2017) Last month marvel shared the first teaser of Guardians of the Galaxy vol. 2   describing it as the official sneak-peak. All five main characters- star lord, gamora, rocket, Drax and 'baby' groot . It is set two months after the first movie where the ravagers find out that peter quill double crossed them about the infinity stone (orb). That's all. Everything else is yet to be discovered.