Robert S. Erikson
With about three weeks to go before the election, Obama leads McCain by about eight points. While an upset remains possible, Obama is clearly poised as the likely presidential winner. What can we expect from the polls over the next three weeks, and how well will they predict the actual outcome?
For poll-watchers, the temptation is to treat every new poll as a decisive piece of new evidence, as if any departure from the current trend might indicate a change that will carry forward to Election Day. But the next outlier we see will probably be an artifact of routine sampling error rather than a harbinger of true change. True change in preferences occurs slowly, especially during the final weeks of a campaign. Observe the following graph of the Bush-Kerry vote in the polls during the final 28 days leading up to the 2004 campaign
The Polls During the Final 30 Days of the 2004 Presidential Election. Each dot represents the poll-of-polls or average of all polls whose coverage centered on that date. The curved blue line represents a lowess trend line. The dashed orange bar represents the Election Day outcome.
This graph shows the poll-of- polls for the final 28 days in 2004. (Polls are assigned to the date that is the midpoint of their coverage.) Although a trend line can be forced through the data as the wobbly curved line, the distribution of the observations shows no real pattern and is within the range that, according to sampling theory, would occur by chance if there was no actual change during the final 38 days. In other words, technically, we cannot reject the null hypothesis that all observed variation in the final month of the 2004 campaign was sampling error, with voter preferences constant throughout. (The standard deviation of the observations is a mere 1.01.) Probably Bush was slightly ahead through this period, and the occasional poll that showed Kerry in the lead was a statistical illusion.
We can extract information about the final few weeks in the polls from elections going far back—as far back as 1944. The next graph shows the week-to-week movement in the polls leading up to Election Day and then the shift from the final week’s polls to the Election Day verdict. Poll verdicts represent the poll-of-polls for the week, with polls assigned to a particular week according to the middle date of their polling period. Observations are based on polls from 1944 through 2004, although not all election years are represented by polls for the given week.
Weekly poll margins by lagged weekly poll margins in the latter weeks of the campaign, 1944-2004. Observations are based on weekly polls-of-polls.. For some weeks of some election years, there were no polls. The diagonal lines represent lines of equality between the poll margin and the lagged poll margin, not regression lines.
This graph’s obvious feature is the incredible stability of the polls from one week to the next. And then the final polls predict the vote quite well, although the size of the lead in the final polls typically shrinks by about 30 percent on Election Day. If we regress the poll verdict on the lagged verdict for weeks T-1, T-2, or T-3, the adjusted R squared is .96 or higher. If we regress the Election day verdict on the poll verdict for the final week (ignoring pre-1952 quota-sample polls), the adjusted R squared is .95. Thus, we see that in past campaigns, in the runup to the election, weekly change has come in small increments.
Still, the slim movements from one week to another carried some meaning. As the next figure shows, from week to week the polls became increasingly accurate as a predictor of the final outcome. We also see that the poll margins exaggerate the size of the vote margins. (Cases cluster in the “West by Southwest” and the “East by Northeast” octants.) The most remarkable fact is the near absence of cases in the off-diagonal—where the leader in the poll-of-polls ends up losing. The two exceptions of late-campaign comebacks are Truman’s famous surge in 1948 (partially an artifact of bad polling) and 2000 (Gore’s futile popular vote comeback).
Election Day vote by poll margins in the four weeks leading up to the election, 1944-2004. Observations are based on weekly polls-of-polls. For some weeks of some election years, there were no polls. The diagonal lines represent lines of equality between the poll margin and the vote, not regression lines.
So where does this leave us regarding 2008? As of this writing, about three weeks before the election, Obama leads McCain by about 8 points. My back-of-the-envelope calculations based on the regression of the vote on polls on this date in earlier years suggests about an 86 percent chance of an Obama victory, giving McCain one chance in seven of pulling it out (slightly more optimistic for Obama than the current betting markets have it). . This forecast is based on polls alone, without considering how the economic crisis aids Obama. Then there may be other unknowns this year that bear on the final outcome. So let's continue to keep an eye on the polls.
The author is a professor of Political Science at Columbia University. The data and analysis presented here is from joint work with Christopher Wlezien of Temple University.