Select Page
Poker Forum
Over 1,292,000 Posts!
Poker ForumFTR Community

**** Elections thread *****

Results 1 to 75 of 8309

Hybrid View

Previous Post Previous Post   Next Post Next Post
  1. #1
    Quote Originally Posted by Poopadoop View Post
    Your argument is that, in PA, something besides Trump being president changed between 2016 and 2017 that resulted in the D swing. That's got nothing to do with heteroscedascity. Even if it were possible for one predictor and one outcome varaible to result in heteroscedascity (it isn't),
    Do there need to be two+ input variables for heteroskedasticity to arise? For example, if the regression is income = age, the variation in income over the range of age doesn't result in heteroskedasticity, yet if the regression is income = age + gender, then the variation in income over the range of age does result in heteroskedasticity?

    My response to your argument is that explanation only works in PA, it can't explain the other five data points.
    That's fine. I'm not trying to opine on the other ones. I shouldn't have quoted them.
  2. #2
    Quote Originally Posted by wufwugy View Post
    Do there need to be two+ input variables for heteroskedasticity to arise? For example, if the regression is income = age, the variation in income over the range of age doesn't result in heteroskedasticity, yet if the regression is income = age + gender, then the variation in income over the range of age does result in heteroskedasticity?
    It's a bit complicated to explain (and a bit early in the morning) and not relevant to the binomial test I did because that's a non-parametric test that doesn't make any assumptions regarding how the variance is distributed. I will try to get back to this later.
  3. #3
    Quote Originally Posted by wufwugy View Post
    Do there need to be two+ input variables for heteroskedasticity to arise?
    Sorry, I was thinking in terms of the analysis I had just done when I said one predictor and one outcome variable can't result in heteroscedascity. What I should have done, to be completely correct, is preface that with 'assuming you treat both variables as dichotomous' as in the case of an 'election being held either in 2016 or 2017', where that predictor variable is dichotomous because it has only one of two values (2016 or 2017), and the outcome variable being dichotomous as in either swing(D) or swing (R).

    However, if you wanted to do a t-test, you would be treating the outcome variable as continuous (% swing in either direction). So you would have to jump through certain hoops to fulfill the assumptions of normality, including possibly transforming the data, and you would as a matter of course use a pooled estimate of the variance which generally speaking should adequately address any issues around heteroscedascity.

    If you can't be arsed to do that and just want to do a quick back-of-envelope calculation, you would just choose a non-parametric test which has less power but also fewer assumptions to worry about, which is why I did the binomial one. My guess is that the more powerful t-test would have returned a likelihood ratio closer to 1000:1 in favour of the model assuming an increase in D support from 2016 to 2017 in those six election districts. This is because of the generally large effect (mean = 17.7%) closely clustered around the mean. The binomial test ignores the size of the individual values and only considers whether they are positive or negative, so the evidence it gives in this case, while still strong, is not overwhelming.

    If you have a lot of experience with numbers, you can also use the interocular trauma test, whereby if the data hits you between the eyes you can glean the existence of an effect without carrying out a formal test. A layperson's version of the interocular trauma test would be to believe that the house next to theirs is closer to them than the moon 365 days a year, without carrying out any formal measurements or doing a statistical test.




    Quote Originally Posted by wufwugy View Post
    For example, if the regression is income = age, the variation in income over the range of age doesn't result in heteroskedasticity, yet if the regression is income = age + gender, then the variation in income over the range of age does result in heteroskedasticity?
    Both income and age in the example you give are variables that can have continuous values. If, e.g., you draw a scatterplot of income and age, you'd expect there to be a tight cluster around people 0-1 yrs old having an income of 0, and as age increases, the spread of incomes around age would increase, so that the overall impression would be of a cone-shaped distribution. That would be an example of heteroscedascity, because age would strongly predict income at age 0-1 but not so well at age 50-51 (or whatever). The Pearson correlation coeffecient of that analysis based on a normal distribution would be problematic, and a sensible statistician would not do a Pearson correlation but would do something where normality is not one of the assumptions, known as a non-parametric test, such as a Spearman's rho correlation. Moreover, things would get more complicated if your data included people over 65 who are retired and generally not having a lot of income.

    If you added in gender as a coefficient, its interaction with the other variables would be another potential source of violation of normality.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •