|
Originally Posted by wufwugy
I see a pattern.
Me too.
Originally Posted by wufwugy
One that seems to be univariate in a multivariate world,
The numbers are a representation of the outcome of the multivariate world. The numbers themselves are 'univariate' because that's the easiest way to summarize the net result of multiple effects. E.g., an average has a simple interpretation, the various numbers that go into the average have no simple interpretation in and of themselves, except as part of a summary figure such as a mean, mode, or standard deviation (among other things one might compute). But you are right to be skeptical of the summary statistics, because they often don't tell the whole story.
Originally Posted by wufwugy
one that probably has some real heteroskedasticity problems
Heretoscedasticity refers to variation in an outcome variable changing along the dimension of a predictor variable. This is not relevant here since there is no variance in this particular predictor variable.
Originally Posted by wufwugy
and other statistical jargon I don't know about..
Any analysis has assumptions that if not met, can negatively impact the reliability of the analysis. In the case of percentage data, one could test these six 'swing' values in a number of ways, and compute the relative likelihood that they came from a world in which Ds had gained popularity as indexed by these six special elections relative to the same elections held in 2016, to a world in which no change in popularity had occurred.
The easiest test is a simple binomial test, the only assumption of which is that the data are dichotomous (i.e., either x happens or y happens, not both). In this case, all the data show a change from 2016 to 2017, so that assumption is met. The data don't have to be normally distributed and even Taleb couldn't bitch at this test because the tails can be as fat as you like.
The binomial test can be applied to compare two models of the data. First, if no change has occurred, the swing in each election should be equally likely to favour either side. This is termed the 'null' hypothesis (Ho), and we can express this as p(R) = 0.5 and p(D) = 0.5, where p(R) and p(D) represent the probability of a swing favouring the Rs or Ds, respectively. Alternatively, if the swings were influenced by a general increase in D popularity that had happened between 2016 and 2017 in these districts, the predicted results will differ from the null, and we can call this the 'change' hypothesis (Hc)
The actual outcome was p(D) = 1.0 and p(R) = 0, since D support went up in 6/6 districts. One approach then is to base our change hypothesis on the maximum likelihood estimate for the data, which is essentially a model that assumes the real world is most likely to match the outcome (6/6 D swings or p(D) = 1 and p(R) = 0) rather than any other hypothetical situation which we did not observe (such as p(D) = .5 and p(R) = .5, as assumed by the null hypothesis).
You can compute the relative likelihood of 6/6 D swings happening by computing the relative probability of that event given either model and dividing, then applying a penalty for the free parameter in the change Hypothesis.
p(6/6|Ho) = .015625
p(6/6|Hc) = 1
LR Hc:Ho = 1/.015625 * 1/exp(1)
= 23.5
So the data are 23.5 times as likely to occur if Dems were performing better in all six districts in 2017 relative to 2016 than if there was no difference.
Originally Posted by wufwugy
Let's analyze the ongoing PA election.
It has a 20% swing from R to D from 2016 when voting Trump to 2018 and Trump is not on the ballot. The district has a 50k net of D voters normally. It's in a district that won't exist in a few months. The previous R congressman in the district was humiliated and shamed over an affair and (reportedly) attempted abortion. The R in the current race has little personal appeal and poor fundraising. The D in the race ran as an R, specifically as a very Trumpian R.
The 20% swing data by itself tells the wrong story about what actually happened here.
Not quite. You are ascribing variables to the outcome of the election that while you may find them plausible, have effects that have not and cannot be measured. My analysis is simply that a change occurred (or is very likely to have occurred, 23.5:1). I am analyzing the data objectively, you are explaining them subjectively. In other words, I am saying what happened, and you are trying to explain why it happened.
There is nothing wrong with subjective analyses, as long as you acknowledge they are subjective.
|