Select Page
Poker Forum
Over 1,291,000 Posts!
Poker ForumFTR Community

Probability question (math nerds welcome)

Results 1 to 37 of 37
  1. #1
    Renton's Avatar
    Join Date
    Jan 2006
    Posts
    8,863
    Location
    a little town called none of your goddamn business

    Default Probability question (math nerds welcome)

    So I've been trying to study variance as it applies to poker, and I'm a little confused on something. I'll give an example.

    Suppose its the river and you're calling a bet of one unit into a pot of one unit, with exactly a 1/3 chance of winning. You call and the EV is 0 units, which is the arithmetic mean. There are two outcomes, one in which you lose one unit 2/3 of the time and one in which you gain two units 1/3 of the time.

    My cursory knowledge of probability theory says that the variance is as follows:

    Variance = Sum of (Probability of outcome)(distance from mean ^2) for all outcomes

    V = (1/3)(2^2) + (2/3)(1^2)
    V = 2 units^2 / trials^2

    Standard Deviation = sqrt(V)

    σ = 1.414 units / trial

    OK, so now I want to know what 95% of my range of values near the mean will be for 100 trials. I'm pretty sure this is what a confidence interval is, but it's possible that I may be misunderstanding the term. Anyway to calculate the 95% confidence interval after 100 trials:

    670px-Calculate-Confidence-Interval-Step-5-Version-4.jpg

    According to a statistics book, Z for 95% is 1.92. Since V is additive for trials, V for 100 trials = 200, so σ for 100 trials is sqrt(200) = 14.14, thus the standard error is as follows:

    Standard Error = 1.92 x 14.14/sqrt(100) = 2.715, and my 95% confidence range is (-2.715 units, +2.715 units).

    To me, this seems unbelievably small, so I'm not sure this range is describing what I want to know. Suppose I take a brute force approach.

    If I call this bet 100 times, there are exactly 101 different outcomes. I could win 100 times, or win 99 times, or win 98 times, and so on until winning zero times. The probabilities of these outcomes are very easy to establish, for example:

    P(win 100) = (1/3)^100, EV(win 100) = 200 units
    P(win 99) = 100*(2/3)*(1/3)^99, EV(win 99) = 197 units
    etc
    etc
    P(win 1) = 100*(1/3)*(2/3)^99, EV(win 1) = -97 units
    P(win 0) = (2/3)^100, EV(win 0) = -100 units

    Intuitively it seems clear that the highest probability outcome will be winning 33 times, and that probability of the outcomes will descend smoothly away from that. So if I did all this arithmetic, and added the highest probability outcomes until I reached 95%, would all of those outcomes fall within the (-2.715 units, +2.715 units) range?

    Apologies if I wasn't very clear in my explanation, let me know if you need clarification on something.

    Thanks
    Last edited by Renton; 12-30-2014 at 01:46 PM.
  2. #2
    In London atm so can post yay.

    Your math is not correct and the actual confidence interval is even more narrow.

    st dev = 1.4
    standard error = 1.4/sqrt(100) = 0.14
    margin of error = 1.96*0.14=0.2744
    confidence interval 95% = [-0.2744, +0.2744]

    What this means is that if we keep constructing samples from this distribution, 95% of the time the mean will be between these two numbers. In the case of 100 wins the mean is 2 so it should be clear that your bounds were meaninglessly large.

    Also note that typically when confidence intervals are used, we have a data set but we don't know the real underlying probability disribution because it's an experimentally derived sample. Here is quite straight-forward because we know the exact distribution.

    In short, confidence intervals are really tricky. I think you only really get a feel for what they represent when you use them in practice. Not doing that atm so can't help you there.It's a known head breaker.
    Last edited by jackvance; 01-01-2015 at 11:09 AM.
  3. #3
    Renton's Avatar
    Join Date
    Jan 2006
    Posts
    8,863
    Location
    a little town called none of your goddamn business
    why are you using the standard deviation for one trial with a sample of 100 trials? Don't you need to use the standard deviation for 100 trials?


    In the case of 100 wins the mean is 2 so it should be clear that your bounds were meaninglessly large.
    The mean for 100 trials is the expected value for 100 trials. Since the EV for one trial is zero, so should the EV for 100 be. And the EV outcome of winning all 100 would be 200, not 2. I think you're considering the wrong sort of mean, otherwise I am.

    Can you address what the results of my long form calculation at the end of the post would be? Would it be the same as the 95% confidence interval or something different entirely. If the latter, then confidence interval is definitely not the value that interests me.
    Last edited by Renton; 01-01-2015 at 11:19 AM.
  4. #4
    You did not provide a sample.

    I took your definition of V = (1/3)(2^2) + (2/3)(1^2)

    if we look at this, its the average standard deviation if the sample was perfectly following the odds. In this case that isnt possible because one third of a hundred is not a whole number. You can rewrite it to follow the definition of going over 100 samples as per below:

    V = ((100*1/3)(2^2) + (100*2/3)(1^2))/100
  5. #5
    The arithmetic mean (or simply "mean") of a sample , usually denoted by , is the sum of the sampled values divided by the number of items in the sample:


    Quote Originally Posted by Renton View Post
    If the latter, then confidence interval is definitely not the value that interests me.
    This is probably the conclusion to be drawn here. You seem to be getting a few concepts confused.
  6. #6
    Renton's Avatar
    Join Date
    Jan 2006
    Posts
    8,863
    Location
    a little town called none of your goddamn business
    All I want to know is if there is a short-cut method to my long form calculation:

    Of every product of every outcome and its probability, counting each outcome near the mean of zero units, how far from the mean would I need to go before I've added 95% of all of the outcomes? I thought that this is what a 95% confidence interval was, but it seems like I am wrong.

    I still think you are incorrect for using the st.dev for one trial instead of calculating one for 100 trials to use in the standard error formula, but I have no authority to push this claim.

    Thanks for biting, I was a little dismayed that it took so long to get a reply, so don't think I'm being argumentative here, I'm genuinely trying to understand this.
  7. #7
    MadMojoMonkey's Avatar
    Join Date
    Apr 2012
    Posts
    10,322
    Location
    St Louis, MO
    None of this sits well with me. That tiny range after 100 trials is too low. This says that you expect to have exactly $0 after 100 trials about 19 times out of every 20 experiments. Gut check says no.

    ***
    Isn't StDev = sqrt(2) units per trial? So StDev after 100 trials is 141.4?

    141.4 /sqrt(100) = 14.14
    14.14*1.92 = 27.15

    Using 1.96 is for "infinite trials", and generally is a fine approx for more than 20 trials, but since 1.92 is technically correct here, I'm using it.

    (I start work in 15 minutes, so I can't really get into this right now. I'll think on it this afternoon.)
  8. #8
    MadMojoMonkey's Avatar
    Join Date
    Apr 2012
    Posts
    10,322
    Location
    St Louis, MO
    Quote Originally Posted by Renton View Post
    Can you address what the results of my long form calculation at the end of the post would be? Would it be the same as the 95% confidence interval or something different entirely. If the latter, then confidence interval is definitely not the value that interests me.
    I think the result from this would be exactly 0, because it's a long-form EV calc.

    Excell could answer this one in a couple of minutes.

    EDIT: It would not be exactly 0 because it is not weighted by the binomial distribution. If the combinatorial weights of each outcome are taken into account (which is your intent, I believe), then it is a long-form EV calc.

    Without applying the binomial distribution, it is NOT related to the variance or any other meaningful statistic, AFAIK.
    Last edited by MadMojoMonkey; 01-01-2015 at 11:51 AM.
  9. #9
    Quote Originally Posted by MadMojoMonkey View Post
    None of this sits well with me. That tiny range after 100 trials is too low. This says that you expect to have exactly $0 after 100 trials about 19 times out of every 20 experiments. Gut check says no.
    Confidence interval is about the mean of the sample. So if you have +19 after 100 samples, the mean of 0.19 is still in the confidence interval.

    Isn't StDev = sqrt(2) units per trial? So StDev after 100 trials is 141.4?
    lol no
  10. #10
    If I look at it like this, Renton what you are looking for is the confidence interval times the sample size.

    So 95% falls within [-27.7, +27.7}

    Seems to make a lot more intuitive sense.
  11. #11
    MadMojoMonkey's Avatar
    Join Date
    Apr 2012
    Posts
    10,322
    Location
    St Louis, MO
    Gut check says no.

    We're talking about the expected outcome after 100 trials, not the expected outcome of each of the trials.
  12. #12
    Quote Originally Posted by MadMojoMonkey View Post
    Gut check says no.

    We're talking about the expected outcome after 100 trials, not the expected outcome of each of the trials.
    We actually seem to come to the same conclusion, with only the 1.92 and 1.96 difference.

    14.14*1.92 = 27.15
    So your interval [-27.15, +27.15]

    This is not the confidence interval though because is strictly about the mean. Probably this has a name in statistics though.

    But please look it up.
  13. #13
    Renton's Avatar
    Join Date
    Jan 2006
    Posts
    8,863
    Location
    a little town called none of your goddamn business
    Looks like i need to spend the hour or so it would take to come up with all of those outcome products then eh? Was hoping to avoid that because it just invites more room for me to miss a zero somewhere and fuck it all up. I'd rather have a simple formula that leads to my answer.
  14. #14
    Renton's Avatar
    Join Date
    Jan 2006
    Posts
    8,863
    Location
    a little town called none of your goddamn business
    Quote Originally Posted by MadMojoMonkey View Post
    Isn't StDev = sqrt(2) units per trial? So StDev after 100 trials is 141.4?

    Variance is additive, standard deviation is not. St.Dev for 100 trials is 14.14 units / 100 trials. I am fairly certain of that much.
  15. #15
    Renton's Avatar
    Join Date
    Jan 2006
    Posts
    8,863
    Location
    a little town called none of your goddamn business
    I want to make it clear that I'm trying to understand the range of means that could come from 100 trials with the known population distribution. If I'm using incorrect terminology somewhere, I'm sorry for causing confusion.
  16. #16
    Quote Originally Posted by Renton View Post
    Thanks for biting, I was a little dismayed that it took so long to get a reply, so don't think I'm being argumentative here, I'm genuinely trying to-along on the new understand this.
    np, it's dumb luck i replied though because I'm in London one day only (FTR is blocked in UAE where I live) doing a fly-along on a new aircraft (the luxurious A380).

    And I have some renewed interest in statistics as well. This ties in with my plans to get into big data analysis where a good foundation of statistics really helps. Aircrafts spit out fucktons of data and sifting through this is a big part of our job, so this is a lot more than just hobbyism, which adds motivation to actually do it.

    Of course this is all part of my larger plan to get some ML/AI system running, but that's a really longterm one.
  17. #17
    Renton's Avatar
    Join Date
    Jan 2006
    Posts
    8,863
    Location
    a little town called none of your goddamn business
    Quote Originally Posted by jackvance View Post
    So 95% falls within [-27.7, +27.7}
    The math that went into this result seems weird but this is the kind of range I would expect tbh.
  18. #18
    MadMojoMonkey's Avatar
    Join Date
    Apr 2012
    Posts
    10,322
    Location
    St Louis, MO
    Quote Originally Posted by jackvance View Post
    Quote Originally Posted by MadMojoMonkey View Post
    Gut check says no.

    We're talking about the expected outcome after 100 trials, not the expected outcome of each of the trials.
    We actually seem to come to the same conclusion, with only the 1.92 and 1.96 difference.


    So your interval [-27.15, +27.15]

    This is not the confidence interval though because is strictly about the mean. Probably this has a name in statistics though.

    But please look it up.
    Sorry. The posts were coming fast and I was behind the thread, and I tried to delete this response before I started my shift, but I guess it missed.

    I agree with this range.

    EDIT: The CI should be symmetric about the mean. We're using a normal distribution, which is symmetrical about the mean, to model the outcome.
    Last edited by MadMojoMonkey; 01-01-2015 at 04:35 PM.
  19. #19
    MadMojoMonkey's Avatar
    Join Date
    Apr 2012
    Posts
    10,322
    Location
    St Louis, MO
    Here's something I did a while ago that may be interesting. I was investigating how long is the long term in poker, given some modest input assumptions.



    The graphs show a 95% CI on the projected BR for a winrate of 5 BB/100 and variance of 80 BB/100.

    The dotted green line is the EV, or mean value. The red and cyan lines are the upper and lower bounds of the CI.

    You can see in the lower plot that the short term is dominated by variance, with the boundaries expanding proportionally to the square root of the number of hands played.

    In the long term plot above, you can see that the winrate (no matter how slight) will eventually dominate. The EV increases linearly, while the boundaries increase proportional to the square root. The linear function will always dominate in the long run.

    Note that the upper graph is a logarithmic scale. The green line is the same straight line in the lower plot, but the "squashing" of the x-axis makes it look curved. The red and cyan lines are always an equal distance above and below the green line. The oddness in the logarithmic graph is an artifact of the 'squishing.'

    The logarithmic scale is chosen to illustrate just how long the long term is in poker.
  20. #20
    Quote Originally Posted by Renton View Post
    Looks like i need to spend the hour or so it would take to come up with all of those outcome products then eh? Was hoping to avoid that because it just invites more room for me to miss a zero somewhere and fuck it all up. I'd rather have a simple formula that leads to my answer.
    There is a shortcut way to calculate this now. The upper bound is +27.7.

    The closest to get there is between (win 42) EV = 26 and (win 43) EV = 29.

    Similar calculation can be done for the negative bound, which is around (win 24) EV = -28

    So 95% of all occurrences fall within 24 wins and 43 wins.

    If I understand you correct this is what you were looking for?
  21. #21
    Renton's Avatar
    Join Date
    Jan 2006
    Posts
    8,863
    Location
    a little town called none of your goddamn business
    Yeah that's certainly what I'm looking for, assuming it is correct. Now I want to figure out how to quickly do this for a variety of % ranges i.e. 99 90 75 etc, and manipulate the population a little and see what results. For example if we change our equity on the river from 33% to 37%, how much different do those numbers look? I'm gonna set up a google sheet for this.

    My end goal is to test a hypothesis that I have which is the following: Very thin plays for big bets aren't just marginally +EV, they're -EV. Practically, at the sample sizes humans are capable of experiencing, particularly during reasonable spans of time (say around 6 to 24 months of play), if you can't be reasonably confident that the play will result in equity > the mean, and your bankroll and sanity aren't nearly infinite, you should just pass on these spots. So I want to know "how thin is okay" for a variety of common poker situations, and eventually develop an intuitive sense of this.
    Last edited by Renton; 01-02-2015 at 04:41 AM.
  22. #22
    MadMojoMonkey's Avatar
    Join Date
    Apr 2012
    Posts
    10,322
    Location
    St Louis, MO
    Quote Originally Posted by Renton View Post
    Yeah that's certainly what I'm looking for, assuming it is correct.
    I agree with JV's numbers.

    Quote Originally Posted by Renton View Post
    Now I want to figure out how to quickly do this for a variety of % ranges i.e. 99 90 75 etc, and manipulate the population a little and see what results.
    This is gonna be an easy task.
    The multiplicative factor of 1.92 is the only thing that changes when you're looking at different CI %-ages. For simplicity, I'm going to use the "infinite trials" value, which should only have a slight error in the 3rd sig.fig.
    75% CI -> 1.15
    90% CI -> 1.64
    95% CI -> 1.96
    99% CI -> 2.58

    On Excel, you can get this number by putting the %-age in cell A1. Into another cell, enter
    =NORMSINV(1-(1-A1)/2)

    Quote Originally Posted by Renton View Post
    For example if we change our equity on the river from 33% to 37%, how much different do those numbers look? I'm gonna set up a google sheet for this.
    Well, first off, if your equity goes from 33% to 37% and the bet sizes remain the same, then the EV will be positive. We know that 33% equity is exactly break-even with those bet amounts. So greater equity means greater EV, and EV was 0, so def. +EV.

    Quote Originally Posted by Renton View Post
    My end goal is to test a hypothesis that I have which is the following: Very thin plays for big bets aren't just marginally +EV, they're -EV.
    No matter how wide the variance, if a bet is +EV, then the expectation is to win more than is lost after many bets.

    Quote Originally Posted by Renton View Post
    Practically, at the sample sizes humans are capable of experiencing, particularly during reasonable spans of time (say around 6 to 24 months of play), if you can't be reasonably confident that the play will result in equity > the mean, and your bankroll and sanity aren't nearly infinite, you should just pass on these spots. So I want to know "how thin is okay" for a variety of common poker situations, and eventually develop an intuitive sense of this.
    We know that all +EV bets are good in the long term, but sanity has time constraints. I feel like the obv answer is if you want to take poker seriously, you have to embrace the variance, not avoid it. The assumed goal is to seek out the max EV in any situation, even if a lower EV play has lower variance.

    I'm interested in your conclusions.
    Last edited by MadMojoMonkey; 01-02-2015 at 09:21 AM.
  23. #23
    MadMojoMonkey's Avatar
    Join Date
    Apr 2012
    Posts
    10,322
    Location
    St Louis, MO
    Quote Originally Posted by Renton View Post
    Can you address what the results of my long form calculation at the end of the post would be? Would it be the same as the 95% confidence interval or something different entirely. If the latter, then confidence interval is definitely not the value that interests me.
    In cell B1 input 0.33
    In cell C1 input =(1-B1)
    {I format these as percents, but whatever}

    In cell A2 input 0
    In cell A3 input 1
    In cell A4 input 2
    {select cells A2 - A4, drag the bottom-right corner of the selected group down so that column A has numbers 0 - 100}

    In cell B2 input =COMBIN(100,A2) * B$1^A2 * C$1^(100-A2)
    {double-click the bottom right corner of B2 after entering the above formula to fill column B in just the right way.}

    In cell C2 input =B2
    In cell C3 input =SUM(B$2:B3)
    {double-click the bottom right corner of C3 after entering the above formula to fill column C in just the right way.}

    Make graphs of Columns B and C, with Column A as the x-axis.



    The top is the probability distribution for X wins after 100 trials.
    The bottom is the cumulative distribution for X wins after 100 trials. (It is the integral of the top function.)
    Bottom may be the more informative graph.

    A cursory glance pulls out the 90% CI as [25, 40].
    I excluded the lower and upper 5%, leaving the middle 90%.
  24. #24
    Renton's Avatar
    Join Date
    Jan 2006
    Posts
    8,863
    Location
    a little town called none of your goddamn business
    Sorry it took so long to reply to you guys. I greatly appreciate your efforts, but I didn't want to reply until I'd done some work on this myself.

    I just did the rigorous long form probability distribution for 100 trials and everything is in confirmation of jackvance's numbers. However, in doing so I discovered that what I really want is the likelihood of having a positive outcome after a stated number of trials, not necessarily an outcome above the mean. For a zero EV play, clearly these are the same thing but for a positive expected value play they are not.

    Also, I tink it really it isn't jackvance's value that I want, it is the cumulative probability of all positive outcomes, not a cumulative probability of mean-proximity-based outcomes.

    MMM, I think you're misunderstanding my meaning a little. Of course I believe that if our equity is > 1/3, that calling is +EV no matter what. My point is that making calls for big bets that are very thin is practically -EV. These calls are a poor use of the money it takes to make them. My hypothesis is that if you state an event frequency (say n trials in a six month period, for example) and a bankroll amount, there will be a hard number for how much equity you need in excess of breakeven in order to justify calling. As EV increases from zero, for a significant number of trials, the expectation that we will at least breakeven should approach 100%. For example, in this pot size river bet, I'm curious to know the EV required to have a 95%, 90%, or 75% chance of a positive result after 100 trials. As it turns out:

    >95% chance of +$ ----- 41.6% equity ----- 0.248 units +EV/trial
    >90% chance of +$ ----- 39.8% equity ----- 0.193 units +EV/trial
    >75% chance of +$ ----- 36.8% equity ----- 0.104 units +EV/trial
    >50% chance of +$ ----- 33.6% equity ----- 0.008 units +EV/trial

    I for one find it fairly surprising that you need over 40% equity to have a 95% chance of being up after 100 calls. Clearly, an attempt to reduce one's variance by so much would come with a pretty huge sacrifice of EV. To even get a 75% chance, you have to sacrifice 100*0.104 = over 10 pot size bets worth of EV. That said, I think under minor bankroll constraints there's still a reasonable case for folding hands in the 33-35% range to save variance.

    The google sheet i made for this is pretty sweet, if either of you want a link to it for your own curiosity, PM me.
    Last edited by Renton; 01-03-2015 at 04:04 PM.
  25. #25
    Renton's Avatar
    Join Date
    Jan 2006
    Posts
    8,863
    Location
    a little town called none of your goddamn business
    I've got a new related project. My friend is grinding the spin n go's on pokerstars in a variance sharing team with others. If you aren't familiar with the spins, they work like this. The buyin is 30 dollars, and you play a 3 handed winner-take-all turbo sng with a random prize pool, as follows:

    $90K 1:100K
    $6K 5:100K
    $3K 1:10K
    $750 1:1000
    $300 5:1000
    $180 7.5%
    $120 21.4%
    $60 70.5%

    All in all, you pay $30 dollars to play for about $85.2 worth of prize pool equity, making the rake $1.6.

    So I did some work on the variance of this game and it seems to me like since the outcomes are so wildly asymmetrical about the mean (the only distant outcomes are positive ones, the only negative outcome is losing 30 bucks), normal figures like standard deviation and variance seem relatively useless. Indeed, I used jackvances method of providing a 95% range of outcomes for 100 trials and it comes out that the lower bounds is -3500 dollars, which is impossible since even if you lost all 100 games you'd only be out 3000.

    Is there a special formula for calculating confidence intervals in cases of long-shot positive outcomes such as this?
  26. #26
    As far as I'm aware what you're trying to do, or at least similar, has been posted on 2p2.

    If I remember correctly it's hilariously bad.
  27. #27
    Renton's Avatar
    Join Date
    Jan 2006
    Posts
    8,863
    Location
    a little town called none of your goddamn business
    Well I'm doing it to learn the method, not to learn the result. I have no intention of playing spin n gos, but there are parallels to the type of poker I play. For example if you flat a BU open from the BB with 100bb and A9o, that's a situation where the outcomes will be asymmetrical about the mean, and where you will be winning far more small pots and losing more big pots.
  28. #28
    MadMojoMonkey's Avatar
    Join Date
    Apr 2012
    Posts
    10,322
    Location
    St Louis, MO
    Quote Originally Posted by Renton
    Indeed, I used jackvances method of providing a 95% range of outcomes for 100 trials and it comes out that the lower bounds is -3500 dollars, which is impossible since even if you lost all 100 games you'd only be out 3000.
    There is def something wrong with your calculation if you got a result that is beyond the boundaries of the test.

    ***
    What's rough about this thread is that you (Renton) are just getting the hang of how important it is to know what kind of questions can be answered by probability theory. Once you know what kind of questions can be answered, then you will be intuitively better at formulating questions which imply a method for finding the answer. This is just a phase in the learning process that everyone goes through. You'll be over the hump in a couple of weeks (or thereabouts).

    On this current problem:
    What are you given? (e.g. value and probability of outcomes)
    What exactly is your question? (E.g. what is Hero's the EV after 100 trials @ 95% CI?)
    What are you assuming? (e.g. players of equal skill; Hero has 67% chance of losing $30, 33% chance of winning $X.)
  29. #29
    Renton's Avatar
    Join Date
    Jan 2006
    Posts
    8,863
    Location
    a little town called none of your goddamn business
    Quote Originally Posted by MadMojoMonkey View Post
    On this current problem:
    What are you given? (e.g. value and probability of outcomes)
    What exactly is your question? (E.g. what is Hero's the EV after 100 trials @ 95% CI?)
    What are you assuming? (e.g. players of equal skill; Hero has 67% chance of losing $30, 33% chance of winning $X.)
    1. I have the exact probabilities of the prize pools:

    Code:
    $90000-----0.00001
    $6000------0.00005
    $3000------0.0001
    $750-------0.001
    $300-------0.005
    $180-------0.075
    $120-------0.21366
    $60--------0.70518
    2. I want to know the 75%/90%/95% range of outcomes that will occur after 100/1000/2000/5000/10000 trials. By outcomes I mean cumulative results. I do not mean a range of EV/game over those samples. I expect these ranges will be large, even for the 10000 trials. Preferably I would like this in the form of cumulative probability of all outcomes from most positive to negative, up to 75%/90%/95% as indicated.

    3. I want this to be a spreadsheet where I can plug in an ROI% and the calculations derive from that. 1% ROI is +$0.3/game, 5% is +$1.5/game. When I plug in this ROI number (probably a number between 1% and 10%) the spreadsheet will fill in the probability distribution based on a corresponding win% value according to the following formula:

    (ROI%*0.3 + 30)/85.2 = win%

    Where 30 is the buyin, ROI% is the called out input cell, and 85.2 is the prize pool expected equity.

    So the possible outcomes for each trial are as follows:

    1) win 89970
    2) win 5970
    3) win 2970
    4) win 720
    5) win 270
    6) win 150
    7) win 90
    8) win 30
    9) lose 30

    It seems to me like conventional variance and standard deviation formulas won't work for this because the distant outcomes from the mean are always positive. Also, I think the 90K and 6K outcomes are so statistically remote that they will barely influence any of the results I am interested in (i.e. for samples 10k or less) so it may make the problem more tractable if those outcomes are omitted. That said, the 90k is a substantial part of our equity when playing, that's one of the reasons why the variance in these are so sick.
    Last edited by Renton; 01-04-2015 at 03:36 PM.
  30. #30
    MadMojoMonkey's Avatar
    Join Date
    Apr 2012
    Posts
    10,322
    Location
    St Louis, MO
    If you did the CI calc and the result was beyond a practical value, then that indicates a poor model.

    We are likely exploiting the law of large numbers, which states something like,
    "If we calculate a statistic about many samples from the same population, we know that the statistic we calculate will be distributed in a normal distribution, regardless of the distribution of the thing we are measuring."

    Meaning, I can present you with a device which has a hidden probability distribution of outputs. If I ask you the variance, you can take some trials and calculate the variance over those trials. You repeat this.

    The law of large numbers says that YOUR calculations of the variance will follow a normal distribution. It does not state anything about the underlying distribution that motivates your measurements. I.e. Your measurements of the variance will have a mean, and that mean will have a spread to the data which follows a normal distribution. The variance you are measuring does not necessarily follow a normal distribution, but your measurements of that variance do.

    So if you got a result that is outside practical limits of the input, it could indicate that you have a small sample.
    Or that your model is poor.

    *note that the variance of your measurement of the sample variance is characterized by the F-distribution, which also approaches a normal distribution for large numbers of experiments.
  31. #31
    MadMojoMonkey's Avatar
    Join Date
    Apr 2012
    Posts
    10,322
    Location
    St Louis, MO
    I'm still not 100% clear on what you want. I think I'm confused by the ROI. If I'm guessing correctly, then you want to solve for the ROI, and the ROI is equivalent to a statement about Hero's %-age chance to win the table.

    ***
    We need to know the probability distribution of results. (We have that)
    We need to calculate the mean value and StDev per trial. (We can do that with the prob dist)
    We need to know the number of trials. (We can stipulate this as an independent variable)
    We need to choose the CI. (We can stipulate this as an independent variable)

    In order to solve for the desired mean value which we equate to a win-equity, which we equate to ROI.

    The lower boundary of the interval for N trials will be
    N * ( EV - Z_value * StDev / SQRT(N) )
    or
    N * EV - SQRT(N) * Z_value * StDev

    ***
    The EV is the dot product of the equity of results and the value of results.
    The EV is the mean value, so you have the mean value, and you need the StDev.
    You know how to calculate StDev for a given prob dist, so you have that, too.

    Next, you state a CI, and find the associated Z_value. E.g. for a C.I. of 95%, the Z_value is 1.96
    Now you have to consider that this gives the middle 95%, and we don't want that. We want the top 95%.
    We're not worried about the times when Hero's final value is well above the mean, only when it is well below the mean.
    If we use 1.96 and only look at the lower tail, then we're really calculating the 97.5% CI.
    So we actually want the Z_value associated with a CI of 90%. This middle 90% chops off 5% from the top and 5% from the bottom.
    We only care about the bottom %.

    Once you have the appropriate Z_value, you can plug and chug with your ROI values to get the output to be 0.
    Or you can write out the math and just solve it right out, so that you put in the CI and it tells you the minimum ROI. This could be messy, since changing the ROI will change the variance, which will change the StDev, so you could end up with an ugly equation to solve.
  32. #32
    Renton's Avatar
    Join Date
    Jan 2006
    Posts
    8,863
    Location
    a little town called none of your goddamn business
    If you play X number of spin tournaments, where X can be 100, 500, 1000, whatever, you will have a cumulative result of those plays, and it will fall in a wide range of outcomes, from the extremely remote possibility of hitting the 90k jackpot every time to the far far less remote possibility of losing every single one. I want to know what this range of outcomes will be at certain levels of confidence.

    The problem I can't get my head around is that it appears (to me) that the standard deviation implies a symmetry to outcomes that I don't believe exists. Take the following two hypothetical scenarios:

    1) 100bb stacks NLH cash game, it folds to the button who minraises, you are in the BB deciding whether to call or not. When you call, the range of possible outcomes that can occur are anywhere from you playing a big pot and winning his entire stack, to you winning a small pot, to you losing a small pot, to you losing your entire stack. Every result from +100bb to -100bb is possible. However, being at a positional disadvantage hurts your chances of winning big pots so you'll likely have more -100bb deviations than +100bb. In order to achieve a positive expectation, in practice you have to balance this with winning more medium pots.

    2) You're signing up for a $1 mtt that has 1000 players. The range of outcomes in this case are mostly positive and the only negative outcome is to lose $1, though it is by far the most likely.

    Why would either of these games have outcomes that fall along a normal distribution? The spin n go is an extreme version of 2).
    Last edited by Renton; 01-06-2015 at 02:27 PM.
  33. #33
    MadMojoMonkey's Avatar
    Join Date
    Apr 2012
    Posts
    10,322
    Location
    St Louis, MO
    Delete delete delete.
    You just need to solve a few more prob stats questions and this will all start clicking with you. You are not formulating your questions in a way that implies the method, which means that you don't fully understand the methods.

    More delete.

    The StDev is blind to the symmetry, it is just a number which describes the spread of the data, not the symmetry of the spread. You can calculate the StDev for distributions which are nowhere near symmetrical, and it will still tell you what it tells you. Think about what it is and isn't telling you.

    The symmetry is rooted in the Z_value, not the StDev. The Z_value is symmetrical in +/- about the mean because the Z_value is related to the normal distribution, which is symmetrical. We could use the student's T-distribution, for which we would be using the T_value, which is also symmetrical.

    If we were trying to calculate the variance in our estimate of the variance of a sample, we would use the F-distribution, which is not symmetrical and the F_value is therefore not the same for the + as the - value.

    The *_value is related to the distribution and generally requires calculus to find the exact values for a unique probability distribution.
  34. #34
    Renton's Avatar
    Join Date
    Jan 2006
    Posts
    8,863
    Location
    a little town called none of your goddamn business
    So would the results of playing the spin n gos or MTTs fall along the normal distribution as well?
  35. #35
    MadMojoMonkey's Avatar
    Join Date
    Apr 2012
    Posts
    10,322
    Location
    St Louis, MO
    Quote Originally Posted by Renton View Post
    So would the results of playing the spin n gos or MTTs fall along the normal distribution as well?
    It all depends on what you mean by results.

    As far as the number of wins in N trials, it will be a Binomial distribution.

    As far as the value of wins, I'd really like to work out a couple of hand examples before I commit.
    My gut says that as long as the sample size is large enough to allow for the least likely results to have happened a few times, then the outcomes will be very nearly a normal distribution. In the short term cases, there may be extreme outliers from a normal distribution, but it is expected that the infrequent events will happen equally infrequently in all trials.

    I mean. If you have something with a 1% chance of occurring, and you take a sample size of 10, then you might get 4 to 6 that look the same before you see that one that looks skewed. You might be tempted to see a lack of symmetry in the distribution with 18 the same and 2 outliers. By the time its 1800 the same with 200 cases that are clearly not outliers, you will see that trials of length 10 were just too small to notice the distribution play out.

    It's a case of small samples.
  36. #36
    MadMojoMonkey's Avatar
    Join Date
    Apr 2012
    Posts
    10,322
    Location
    St Louis, MO
    The law of large numbers applies here.

    We're sampling the payout table's probability distribution many times, keeping a cumulative running total.

    The set of the cumulative running totals' values at the end of the trials, no matter how long the trials, will create a distribution of it's own. Do you see?

    So the law of large numbers says that if we run trials of any length, what we measure of any statistic... those measurements will create a normal distribution the more and more of them we take.
  37. #37
    Renton's Avatar
    Join Date
    Jan 2006
    Posts
    8,863
    Location
    a little town called none of your goddamn business
    MMM thanks a lot for your help. I'm pretty sure I still don't fully understand it all, but I feel like I've certainly learned a lot from this thread.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •