By Jay Boice
The Huffington Post publishes live results on the night of elections with data from The Associated Press, loaded using our open-source AP Elections Data Loader. Often, during the early part of the night, only counties that lean toward one candidate have reported, and the results displayed are quite different from the final results. For example, early returns in Virginia in the 2012 presidential election showed Obama at 38.8 percent, although he ended with 51.2 percent of the vote.
Television networks have sophisticated models built with the aid of exit polls and other Election Day data to project the winner of elections, but without a dedicated decision desk at The Huffington Post, we wanted to use the data available to us to help readers see that partial vote counts may not be a good representation of the final results.
During November’s election for Virginia governor, we estimated our best guess at the outcome of the election in real-time, and published it under the actual results in the later part of the evening.
The Huffington Post’s 2013 Virginia election results before all votes were counted.
These estimates were based on the county-level results reported at that time and the results from the 2009 Virginia governor’s race. Over the course of election night, our estimates were consistently closer to the final results than the results being reported at that time.
These estimates are driven by two main principles:
- The swing (change in vote percent since the previous election) for each party will remain roughly consistent across counties.
- Turnout, relative to 2009 turnout, will remain roughly consistent across counties.
To visualize No. 1, we can look at the county-level swings for each state between the 2008 and 2012 presidential elections. Each state follows a nice distribution:
Swing distributions for most states between the 2008 and 2012 presidential elections. New England and states with only one county are omitted.
If we assume swings are distributed like this in other elections, as soon as we have results from a few counties, we can estimate the average change in outcome among the reporting counties, and project that swing onto the remaining counties.
For example, early in the evening of the Virginia governor’s election, even though Cuccinelli was ahead of McAuliffe in the vote counts, the few counties with results reporting showed him down 13.8 percent on average from the Republican in the previous election. Once we projected that swing onto the rest of the counties, we could estimate that McAuliffe would win by 3.5 percent.
In addition to this statewide average, we calculate more specific swing predictions for each outstanding county. To do this, we first break Virginia into clusters of counties based on population, then choose two or three counties within that cluster that voted similarly in the previous election to form a better estimate than the overall state swing. Consider the swings within each cluster between the 2005 and 2009 Virginia governor races:
Counties in Virginia grouped by population, each showing 2005 democratic vote share and 2005-2009 swing.
Within each cluster, the 2005-2009 Democratic swings were fairly correlated; counties that favored the Republican more in 2005 tended to swing more strongly toward the Republican in 2009. Because of this correlation, counties that voted similarly in the previous election provide a better estimate for each other than the overall cluster average. More importantly, breaking the state into clusters tends to group the higher-variance (smaller) counties together, where they have less impact on the bigger counties.
One final issue in Virginia is that heavily Democratic precincts in larger counties tend to report votes last, which means that until late in the election, most of the bigger counties are understating their actual support for the Democrat. Without assuming that these precincts always report last, there’s no way to know this understatement is happening again. If this pattern is typical, it can be handled by applying a 5-point bonus to the Democrat in large and medium counties at the beginning of the night that is reduced (linearly) to 0 points as votes are reported. This same correction works well for both the 2012 and 2013 elections:
This model has also been validated against all states in the 2012 presidential elections and the 2013 New Jersey Senate election held on October 16th, 2013. The early Democratic understatement found in Virginia didn’t hold true in all states during the 2012 election; the model slightly overstates Democratic support in some states, and understates it in others. The differences in voting patterns (including the order in which votes are reported) is the key challenge to mid-election projections. It’s an open question whether the correction factor that was successful for the last two Virginia elections can be used again, and whether this mechanism for correction can be generalized to other states.