How did the 2016 election predictions go so wrong? – A data science view.

News, Tech Solutions

2016-11-18-11_06_53-2016-election-forecast-_-fivethirtyeight Five Thirty Eight, a website that was created by Nate Silver after he gained popularity by accurately forecasting several elections, had this image on their website before the election. Using incredible data science that worked well in the past, they got it wrong. As a person who makes data products (predictive models, performance dashboards and reports) to inform decision makers about complex problems, I wondered why. Well, here is what I’ve found:

People are much more dynamic than products.

As we have learned to make data products for health and human services, we don’t create a computer system to make a decision for the customer because a computer would be making decisions about people’s lives. A person in authority needs quality data to make a decision, but it’s up to the person making the decision, informed by quality data to decide. The reason is that people change faster and in ways, we can’t predict. As the writer Harry Enten for Five Thirty Eight writes about what the election taught him,

“Political parties, in other words, are dynamic — their coalitions change.”

The models you run, which are made by people, determine the result you receive.

Beginning in 2000 and lasting until 2010, forecasts of weather had a lot of variance in its accuracy as the impact of climate change started to ruin their models. The climate changed, but the models for forecasting the weather hadn’t. The forecaster would predict rain five days in the future, and you would get sun instead. It’s called Model Bias, and it slowly impacted the accuracy of national weather data starting in 2000 and until they changed their models starting in 2009. The electoral climate changed, but the models and the data driving those models didn’t.

Your model is only as good as the data you receive.

As the Five Thirty Eight folks learned, the data they received going back to 1980 only had nine (9) election cycles. That creates uncertainty into the model. Again from the great article by Harry Enten,

“Small sample sizes can also be super misleading. That’s exactly what made my predictions (and others’) in the primaries so bad. The idea that a candidate like Trump, who had so little support from the party establishment (as measured by endorsements), couldn’t win a primary in the modern era was based off of data dating back only to 1980.”

Also, if you are only getting data from certain people, in this case, your typical voter, you will miss new populations. For the 2016 election, new voters who historically didn’t participate in the process came out and voted. So the data you receive determines how well your model will work. To put it another way, garbage in, garbage out.