Home / Dave Brockington / Pedantic Pet Peeves, Methodological Edition

Pedantic Pet Peeves, Methodological Edition

On February 17, 2011

At 12:17 pm

1510 Views

Nate Silver writes an interesting piece about a minor debate surrounding the utility (or lack thereof) of favorable ratings for (largely potential in the case of 2012) presidential candidates this far in advance of the election. The article linked is technically a response to a rejoinder by Michigan political scientist Brendan Nyhan to an earlier article of Silver’s. Much like the Arab world and middle east, this debate is currently underway; Nyhan posted a followup yesterday.

Reviewing both sets of articles, I find Nyhan’s arguments more convincing, perhaps not surprising given his training. In his first piece, Silver suggested that the Republican field — at this point in time — could be considered weak. Nyhan’s initial reply can be distilled down to so what? While Silver might have misattributed to Nyhan the claim that early polls in primaries are “useless” and “don’t matter” (my quick read of the articles in question is that it’s ambiguous, but the headline for Nyhan’s HuffPost article on the topic would perhaps have been better stated by dropping the word “Primary”) , that’s a secondary topic. The main point Nyhan makes, citing both Hibbs and Bartels, is that the ephemeral concept known as candidate quality, image, etc., however measured, only has an effect at the margins of an election. While, as Silver contends, the observations that we have from polling data this far in advance of a presidential election are neither useless nor do they lack salience, they simply don’t matter nearly as much as other factors. We know what the important determinants will be in advance of the 2012 election: the state of the economy, employment figures, sociotropic assessments of the state of the country, and the interminable wars (or to the right, who lost Egypt), not the relative strength or weakness of the Republican field in February 2011.

Indeed, to underscore this point, Nyhan reproduces a figure from the Hibbs model explaining the performance of the incumbent party in presidential elections with a parsimonious index of growth in GDP and military fatalities. The model fit (such as it is in a bivariate model) is .90. Silver, obviously examining a different question — the predictive capability of early net favorable ratings on later net favorable ratings, has a much weaker correlation: .63. And this brings us to my pet peeve:

That is not a terribly strong correlation by any means, and the number might change some if the study covered more years and included candidates like Mr. Dole and Mr. Reagan. Nevertheless, the relationship is highly statistically significant (italics added). Even at this early stage, polls tell us something — not everything, not a lot, but something — about how the candidates are liable to be perceived next year following the primaries.

It doesn’t matter how statistically significant an estimate is. Period. An estimate is either significant or not, based on the degree to which one wants to minimize the risk of a false positive. The industry standard, of course, is .05, but some are comfortable at .10 (especially when theory justifies the use of a one-tailed test), others prefer .01, but again largely irrelevant. Responsible analyses report the estimate itself, the standard error, and the p-value, so those of us playing along at home can reach our own judgment. What matters more is the strength of the the observed relationship. In very large samples (large N studies), even the weakest observed relationships are significant. I have a couple papers out with Ns in excess of 40,000: virtually everything in the model was significant, if however irrelevant in substance. Likewise, even strong relationships can be rendered insignificant with a small N sample (because, perhaps, it doesn’t really exist in the population).

Furthermore, statistical significance is only really applicable to drawing inferences from your sample to the target population. In other words, is what we observe in the sample likewise probably going on in the target population? I recongize that this is pedantry of the highest order, and Silver doesn’t have the universe of data for his target population, but he comes damned close. Significance has achieved a currency of its own in both the scientific and broder society, to the point where it is overinterpreted. I once made this point with an article submission, arguing as I had the universe of data, significance tests were technically irrelevant. Both the reviewers and the editor still wanted to see them . . . that argument never flies. Everything important was “significant”, and the article was published.

In reality, Silver reports a relationship that doesn’t tell us a whole lot about the bigger picture. Yes, there is a relationship between net approvals early and late in a campaign, and by this metric the Republican field is atypically “weak” at this point in a campaign, but this affords us limited insight into the chances of either party come 2012.

I also suspect that if you divide the net favorable figures into “well known” and “not well known”, there’s considerably more variance amongst the latter candidates, as attitudes regarding the former have had more time to take hold. This interests me, but right now I lack the time to do even this simple analysis.

methodology, polls, the feeble 2012 GOP field, useless pedantry

Born in San Jose, grew up in Seattle, received a Ph.D. in poli sci from University of Washington in 2000. I worked for three years at Universiteit Twente in Enschede, Netherlands, and have worked at the University of Plymouth for 16 academic years now in Plymouth, United Kingdom. I also currently serve as joint campaign coordinator for the Plymouth Sutton & Devonport Constituency Labour Party.

By Dave Brockington