Good Turing Algorithm

Example 1

It is about Predicting the unseen things from the seen things.

Say, you are fishing and in the net, you get 10 carps, 3 perchs, 2 whitefish, 1 trout, 1 salmon and 1 eel. You can compute the probability of the carp in the next try as 10/18. But you also know that river contains catfish and bass as well. How can we compute the probability of the catfish?

C Nc

0 2 {catfish, bass}

1 3 {trout, salmon, eel}

2 1 {whitefish}

3 1 {perch}

10 1 {carp}

Total seen fish in the net = 18

Unseen Fish = 2

P(trout) using MLE(Maximum Likelihood Estimation) = 1/ 18

P(catfish) using MLE(Maximum Likelihood Estimation) = 0/ 18 = 0

Now let us predict the unseen things from the seen things

The postulate is, the unseen group has probability almost equal to the group which appeared least number of times, in this case, three fish appeared least no of times, ie. just 1 time. so the probability of these fish would be equal to the probability of the those fish which did not appear in the net.

So,

P*(Unseen) = P(group( catfish, bass) ) = P(group(trout, salmon, eel)) = 3/18

P*(catfish) using Good Turing = (3/18) * 1/2

P*(trout) = Less than MLE = ?

As, we have added some mass to the probability of unseen group of things to make it non-zero. The total probability will increase to some extent. That’s why we need to normalize all the previous probabilities.

Example 2

Say we take a small sample from a little corpus and the sample is

“Sam I am I am Sam I do not eat”

We know the corpus also contains the words, {is, are, shall, will}

C Nc

0 4 {is, are, shall, will}

1 3 {do, not, eat}

2 2 {Sam, am}

3 1 {I}

Probabilities using MLE

P(is) = 0/10 = 0

P(do) = 1/10

P(Sam) =2/10

Probabilities using Good Turing Algorithm

P*(is) = P*(Group (is, are, shall, will))/4 = P (Group (do, not, eat))/4= (3/10) / 4

P*(do) = Less than MLE =?

P*(Sam) = Less than MLE =?

Again As, we have added some mass to the probability of unseen group of things to make it non-zero. The total probability will increase to some extent. That’s why we need to normalize all the previous probabilities.