On this page you find an interesting variation on the basic notion of sampling at random from a historically based distribution function. It involves sampling with probabilities inversely proportional to frequencies. The previous page illustrated a sampling process that gives emphasis to the "hot" numbers: those that have occurred most frequently in the past. This page shows an appropriate way to emphasize the "cold" numbers: those that have occurred less frequently in the past.
The underlying arithmetic of this approach is mildly intricate although not really difficult to understand. So I won't dwell at great length on why it works; if you are comfortable with algebra you can figure that out for yourself. I intend only to show you how to do it. Then from the "how" it would be just a small step for you to see the "why."
We will use the same set of PLUS Lotto frequency data you have seen before, but the other columns in the table will be obtained differently. I set up the spreadsheet that did the calculations to report 5-decimal accuracy. The entries in the column headed 1/F are just the reciprocals of the frequencies: 1 over 15, 1 over 10, 1 over 11, and so on. That column is just a "working column" whose only function is to obtain the total, called SUM, at the bottom, the number 4.79864. Then the entries in the next column, under IRF are the respective "1/F" entries each divided by the number 4.79864. That number is a "normalizing" factor that converts the "1/F" numbers to numbers like relative frequencies, which add up to 1.00000. The meaning of IRF here, which stands for "inverse relative frequency," is that these numbers are like relative frequencies except they allow us to give a larger weight (probability) to a less-frequently occurring number. Finally, entries in the column headed CIRF, standing for "cumulative inverse relative frequency," are obtained by addition, in the same way we obtained them in the CRF column in the table on the previous page.
_N_ _F_ _1/F_ _IRF_ _CIRF_ 1 15 0.06667 0.01389 0.01389 2 10 0.10000 0.02084 0.03473 3 11 0.09091 0.01894 0.05368 4 11 0.09091 0.01894 0.07262 5 11 0.09091 0.01894 0.09157 6 5 0.20000 0.04168 0.13324 7 8 0.12500 0.02605 0.15929 8 6 0.16667 0.03473 0.19403 9 4 0.25000 0.05210 0.24610 10 8 0.12500 0.02605 0.27217 11 10 0.10000 0.02084 0.29301 12 12 0.08333 0.01737 0.31038 13 11 0.09091 0.01894 0.32932 14 16 0.06250 0.01302 0.34235 15 11 0.09091 0.01894 0.36129 16 7 0.14286 0.02977 0.39106 17 12 0.08333 0.01737 0.40843 18 9 0.11111 0.02315 0.43158 19 5 0.20000 0.04168 0.47326 20 9 0.11111 0.02315 0.49642 21 10 0.10000 0.02084 0.51726 22 9 0.11111 0.02315 0.54041 23 4 0.25000 0.05210 0.59251 24 8 0.12500 0.02605 0.61856 25 12 0.08333 0.01737 0.63592 26 12 0.08333 0.01737 0.65329 27 8 0.12500 0.02605 0.67934 28 11 0.09091 0.01894 0.69828 29 7 0.14286 0.02977 0.72805 30 11 0.09091 0.01894 0.74700 31 9 0.11111 0.02315 0.77015 32 6 0.16667 0.03473 0.80489 33 7 0.14286 0.02977 0.83466 34 6 0.16667 0.03473 0.86939 35 11 0.09091 0.01894 0.88833 36 8 0.12500 0.02605 0.91438 37 16 0.06250 0.01302 0.92741 38 13 0.07692 0.01603 0.94344 39 5 0.20000 0.04168 0.98511 40 _14_ _0.07143_ 0.01489 1.00000 Total: 378 SUM=4.79864
The leftmost column in the table is of course the set of numbers from 1 through 40, while the rightmost column now becomes the "cumulative relative frequency" column we use when we compare with numbers from the RAND() function. Those two columns are an "effective" distribution function that will allow us to randomly sample with probabilities inversely proportional to frequency. It is enlightening to see the effect of selecting Lotto numbers from this new distribution using the same random uniform numbers we used on the previous page. They were:
0.2141_____0.2963_____0.9197_____0.6417_____0.5223_____0.4543In tabular form, here are the random uniform numbers and the numbers we would select from our new distribution function, which emphasizes the "cold" numbers.
_Uniform Number__Lotto Number_ 0.2141 9 0.2963 12 0.9197 37 0.6417 26 0.5223 22 0.4543 19Here is a comparison of the two different Lotto tickets that we selected, first (from the previous page) the "hot-number" ticket, then the "cold-number" ticket.
Directly Proportional Sampling: 9, 13, 18, 21, 26, 38As you can see, the two tickets are similar yet very different when viewed as Lotto tickets, where what you want to do is match all six numbers that are drawn. We should not be surprised that the six numbers selected on the two tickets are so similar, for remember that we are using the same set of random uniform numbers to select them. We used two different distribution functions, yes, but the primarily controlling factor in the selection process is the set of numbers given to us by the RAND() function. Each of the distributions was a cumulative distribution, and a single number from RAND() should produce numbers from each distribution that are expected to be the same or close to each other.Inversely Proportional Sampling: 9, 12, 19, 22, 26, 37
How do I select the two Lotto tickets I would actually purchase for a "small" $4 million jackpot? The way I showed you above. I select one set of 6 random uniform numbers using RAND(), then let them select one ticket in a directly proportional mode and another ticket in the inversely proportional mode.
Hey, why not? It's fun and it gives me something to do.
Cybergeezer (geezer@cybergeezer.com)