|
Originally Posted by seven-deuce
The variance quantifies the amount of spread in a sample of data.
Quantifies - gives a specific number for / measurement of
The square root of the variance is the standard deviation, which I find a more intuitive stat.
In most texts, the default variable used to denote the variance is {sigma^2}, where {sigma} is the stdev.
The standard deviation is nice because it tells us something important about the results.
Specifically:
~68% of the data in the set is within +/-1 stdev of the mean.*
~95% of the data in the set is within +/-2 stdev of the mean.
~99.7% of the data in the set is within +/-3 stdev of the mean.
~99.994% of the data in the set is within +/-4 stdev of the mean.
~99.99994% of the data in the set is within +/-5 stdev of the mean.
For most of us 2 sigma, or 2 standard deviations from the mean, is a very practical guide that compromises between poor results from small samples and getting any meaningful result. For scientific discoveries (like the announcement of the discovery of the Higgs Boson), 5 sigma is used.
This picture is a normal distribution, and while it is not appropriate for poker statistics, it illustrates the point with a well-recognized picture.
*I believe I have mis-stated this as ~76% in recent posts... maybe in another thread, but 68.3% is the correct number.
Originally Posted by seven-deuce
What does variance = 21,704 actually mean in this example?
That link explains what I was going to say, and what I started above.
The variance is not really too helpful as itself, but the square root of the variance gives the standard deviation, which is the more intuitive number. First of all, the [units] on the stdev are the same as the mean - which means that the value of the stdev can be compared to the mean apples-to-apples.
In that example, (and I can't stress this enough) the variance is NOT 21,704. The variance is 21,704 mm^2.
Without the units, the statement is false. (This is 99% a personal gripe against mathematicians from an engineer's perspective, but no mathematician will argue that I'm wrong, just that it's obvious and why don't I STFU.)
*ahem*
So 21,704 mm^2 is hard to make sense of. We're comparing lengths, but the variance is an area. Not helpful.
When we take the square root of the variance, we get 147 mm.
So now we have a variable which describes the spread of the data in the same units as the data. Very helpful.
Mathematically, the variance and the standard deviation are exclusively dependent upon each other - meaning that both contain the exact same information. Since neither is any different than the other, we are free to choose whichever we find more intuitive to use.
While we use the word "variance" in common communication to describe the spread of a set of data, it's mostly 'cause "standard deviation" doesn't roll off the tongue as well. (AFAIK)
Originally Posted by seven-deuce
I know what it is, the average of the squared differences from the mean, but it doesn't really mean anything to me.
(I wrote this before I clicked your link, and it's nice to know that link covers it.)
It's not, necessarily, an average. As pertains to poker stats The denominator in the summation is (n - 1), and not n.
If the denominator is n, then it is the variance for a closed population.
Since we're always talking about open populations with poker stats, the denominator needs to be (n - 1).
Closed population = all data that can or ever will be acquired has been acquired.
Open population = more data will eventually be added to the current data set.
Originally Posted by seven-deuce
Also, what can a standundering of variance teach us about poker?
Understanding variance, and higher concepts like Confidence Intervals, allows us determine in advance how often we expect to be wrong when assigning Villains ranges based on stats. This is the most important thing. We can put a verifiable number on to how wrong we allow our guesses to be. We might be wrong for other reasons (especially in poker), but this much of our wrongness is accounted for.
Why choose to be wrong at all? Why not choose to be 100% certain?
Because the only way we can be 100% certain is if we say it's a number between 0% and 100%. That's not useful.
OK, so why not just choose to say the stat is what the stat is, and that's enough?
Because mathematically, that result by itself represents a very low Confidence Interval, even though it is the mode (most frequent outcome) of the distribution.
By choosing a 95% CI, I choose to be wrong 1 time out of 20. If I want to use thinner error bars, I may choose to be wrong more often. If I choose a 75% CI, then I choose to be wrong 1 time out of 4, but my error bars are much thinner on my guess. If I try to use 99% CI, then for almost all poker applications, the predictions are too wide to be useful.
|