|
|
 Originally Posted by wufwugy
Do there need to be two+ input variables for heteroskedasticity to arise?
Sorry, I was thinking in terms of the analysis I had just done when I said one predictor and one outcome variable can't result in heteroscedascity. What I should have done, to be completely correct, is preface that with 'assuming you treat both variables as dichotomous' as in the case of an 'election being held either in 2016 or 2017', where that predictor variable is dichotomous because it has only one of two values (2016 or 2017), and the outcome variable being dichotomous as in either swing(D) or swing (R).
However, if you wanted to do a t-test, you would be treating the outcome variable as continuous (% swing in either direction). So you would have to jump through certain hoops to fulfill the assumptions of normality, including possibly transforming the data, and you would as a matter of course use a pooled estimate of the variance which generally speaking should adequately address any issues around heteroscedascity.
If you can't be arsed to do that and just want to do a quick back-of-envelope calculation, you would just choose a non-parametric test which has less power but also fewer assumptions to worry about, which is why I did the binomial one. My guess is that the more powerful t-test would have returned a likelihood ratio closer to 1000:1 in favour of the model assuming an increase in D support from 2016 to 2017 in those six election districts. This is because of the generally large effect (mean = 17.7%) closely clustered around the mean. The binomial test ignores the size of the individual values and only considers whether they are positive or negative, so the evidence it gives in this case, while still strong, is not overwhelming.
If you have a lot of experience with numbers, you can also use the interocular trauma test, whereby if the data hits you between the eyes you can glean the existence of an effect without carrying out a formal test. A layperson's version of the interocular trauma test would be to believe that the house next to theirs is closer to them than the moon 365 days a year, without carrying out any formal measurements or doing a statistical test.
 Originally Posted by wufwugy
For example, if the regression is income = age, the variation in income over the range of age doesn't result in heteroskedasticity, yet if the regression is income = age + gender, then the variation in income over the range of age does result in heteroskedasticity?
Both income and age in the example you give are variables that can have continuous values. If, e.g., you draw a scatterplot of income and age, you'd expect there to be a tight cluster around people 0-1 yrs old having an income of 0, and as age increases, the spread of incomes around age would increase, so that the overall impression would be of a cone-shaped distribution. That would be an example of heteroscedascity, because age would strongly predict income at age 0-1 but not so well at age 50-51 (or whatever). The Pearson correlation coeffecient of that analysis based on a normal distribution would be problematic, and a sensible statistician would not do a Pearson correlation but would do something where normality is not one of the assumptions, known as a non-parametric test, such as a Spearman's rho correlation. Moreover, things would get more complicated if your data included people over 65 who are retired and generally not having a lot of income.
If you added in gender as a coefficient, its interaction with the other variables would be another potential source of violation of normality.
|