Inversely Proportional Sampling

On this page you find an interesting variation on the basic notion of sampling at random from a historically based distribution function. It involves sampling with probabilities inversely proportional to frequencies. The previous page illustrated a sampling process that gives emphasis to the "hot" numbers: those that have occurred most frequently in the past. This page shows an appropriate way to emphasize the "cold" numbers: those that have occurred less frequently in the past.

The underlying arithmetic of this approach is mildly intricate although not really difficult to understand. So I won't dwell at great length on why it works; if you are comfortable with algebra you can figure that out for yourself. I intend only to show you how to do it. Then from the "how" it would be just a small step for you to see the "why."

We will use the same set of PLUS Lotto frequency data you have seen before, but the other columns in the table will be obtained differently. I set up the spreadsheet that did the calculations to report 5-decimal accuracy. The entries in the column headed 1/F are just the reciprocals of the frequencies: 1 over 15, 1 over 10, 1 over 11, and so on. That column is just a "working column" whose only function is to obtain the total, called SUM, at the bottom, the number 4.79864. Then the entries in the next column, under IRF are the respective "1/F" entries each divided by the number 4.79864. That number is a "normalizing" factor that converts the "1/F" numbers to numbers like relative frequencies, which add up to 1.00000. The meaning of IRF here, which stands for "inverse relative frequency," is that these numbers are like relative frequencies except they allow us to give a larger weight (probability) to a less-frequently occurring number. Finally, entries in the column headed CIRF, standing for "cumulative inverse relative frequency," are obtained by addition, in the same way we obtained them in the CRF column in the table on the previous page.

     _N_     _F_     _1/F_     _IRF_     _CIRF_
      1       15     0.06667    0.01389    0.01389
      2       10     0.10000    0.02084    0.03473
      3       11     0.09091    0.01894    0.05368
      4       11     0.09091    0.01894    0.07262
      5       11     0.09091    0.01894    0.09157
      6        5     0.20000    0.04168    0.13324
      7        8     0.12500    0.02605    0.15929
      8        6     0.16667    0.03473    0.19403
      9        4     0.25000    0.05210    0.24610
     10        8     0.12500    0.02605    0.27217
     11       10     0.10000    0.02084    0.29301
     12       12     0.08333    0.01737    0.31038
     13       11     0.09091    0.01894    0.32932
     14       16     0.06250    0.01302    0.34235
     15       11     0.09091    0.01894    0.36129
     16        7     0.14286    0.02977    0.39106
     17       12     0.08333    0.01737    0.40843
     18        9     0.11111    0.02315    0.43158
     19        5     0.20000    0.04168    0.47326
     20        9     0.11111    0.02315    0.49642
     21       10     0.10000    0.02084    0.51726
     22        9     0.11111    0.02315    0.54041
     23        4     0.25000    0.05210    0.59251
     24        8     0.12500    0.02605    0.61856
     25       12     0.08333    0.01737    0.63592
     26       12     0.08333    0.01737    0.65329
     27        8     0.12500    0.02605    0.67934
     28       11     0.09091    0.01894    0.69828
     29        7     0.14286    0.02977    0.72805
     30       11     0.09091    0.01894    0.74700
     31        9     0.11111    0.02315    0.77015
     32        6     0.16667    0.03473    0.80489
     33        7     0.14286    0.02977    0.83466
     34        6     0.16667    0.03473    0.86939
     35       11     0.09091    0.01894    0.88833
     36        8     0.12500    0.02605    0.91438
     37       16     0.06250    0.01302    0.92741
     38       13     0.07692    0.01603    0.94344
     39        5     0.20000    0.04168    0.98511
     40      _14_   _0.07143_   0.01489    1.00000
       Total: 378  SUM=4.79864

The leftmost column in the table is of course the set of numbers from 1 through 40, while the rightmost column now becomes the "cumulative relative frequency" column we use when we compare with numbers from the RAND() function. Those two columns are an "effective" distribution function that will allow us to randomly sample with probabilities inversely proportional to frequency. It is enlightening to see the effect of selecting Lotto numbers from this new distribution using the same random uniform numbers we used on the previous page. They were:

0.2141_____0.2963_____0.9197_____0.6417_____0.5223_____0.4543
In tabular form, here are the random uniform numbers and the numbers we would select from our new distribution function, which emphasizes the "cold" numbers.
_Uniform Number__Lotto Number_
     0.2141               9
     0.2963              12
     0.9197              37
     0.6417              26
     0.5223              22
     0.4543              19
Here is a comparison of the two different Lotto tickets that we selected, first (from the previous page) the "hot-number" ticket, then the "cold-number" ticket.
Directly Proportional Sampling: 9, 13, 18, 21, 26, 38

Inversely Proportional Sampling: 9, 12, 19, 22, 26, 37

As you can see, the two tickets are similar yet very different when viewed as Lotto tickets, where what you want to do is match all six numbers that are drawn. We should not be surprised that the six numbers selected on the two tickets are so similar, for remember that we are using the same set of random uniform numbers to select them. We used two different distribution functions, yes, but the primarily controlling factor in the selection process is the set of numbers given to us by the RAND() function. Each of the distributions was a cumulative distribution, and a single number from RAND() should produce numbers from each distribution that are expected to be the same or close to each other.

How do I select the two Lotto tickets I would actually purchase for a "small" $4 million jackpot? The way I showed you above. I select one set of 6 random uniform numbers using RAND(), then let them select one ticket in a directly proportional mode and another ticket in the inversely proportional mode.

Hey, why not? It's fun and it gives me something to do.

Think Big but Bet Small

HOME

Cybergeezer (geezer@cybergeezer.com)