In this special guest feature, Michele Chambers, EVP Anaconda Business Unit & CMO, Continuum Analytics, examines the recent U.S. presidential election results, and how many Americans around the country are wondering how the polls could have been so wrong. Even Nate Silver blew it. What can Nate and other expert statisticians learn from this disaster? Michele is an entrepreneurial executive with over 25 years of industry experience. She has authored two books; Big Data Big Analytics, published by Wiley, and Modern Analytic Methodologies, published by Pearson FT Press. Prior to Continuum Analytics, Michele held executive leadership roles at database and analytic companies, IBM, Netezza, Revolution Analytics, MemSQL and RapidMiner. In her career, Michele has been responsible for strategy, sales, marketing, product management, channels and business development. She holds a B.S. in Computer Engineering from Nova Southeastern University and an M.B.A. from Duke University.
Americans across the country are still shocked at how the presidential polls could have been so wrong. Nearly every election forecast and expert statistician (even Nate Silver) predicted a Clinton victory. So what does this massive error mean? There’s only one answer.
Our “voter model” is broken. Current voter predictions are highly dependent upon it, and if that model is inaccurate, then all the predictions dependent upon that analysis tend to suffer. The voter model used in the 2016 U.S. presidential election assumed a voter turnout comparable to 2012 for both parties. While the Republican turnout was comparable, the Democratic turnout was down significantly. This hugely impacted the analysis. What’s more, the voter model assumes that voters accurately represent how they will vote and tell the truth about their vote from start to finish. Take Brexit, for example. The voter model did not take into account that people would not admit to pollsters how they were going to vote once they got behind the curtain. Even if the voter model is perfect in its representation, those people have to actually vote and have to vote as they state upfront that they’re going to vote.
While we can drive voter turnout and take into account samples that are accurate representatives of the demographics (race, gender, income, etc.) we do not take into account the voters’ behavior today. When races were not as tight or emotionally charged as they are becoming, our demographic based models were good enough and worked fine. However, now with tighter races and emotionally charged events, we should be using psychographics to better form our predictions. The data science exists to do this, and the computational power exists now to process the additional complexity to create more accurate predictions.
While it is harder to collect psychographic information, one possibility is to use facial recognition and/or speech analysis. A subsample of the overall demographic population being polled could be polled in person using facial recognition. Using analytics on the facial expressions and/or audio, the polling information could compare the behavior to the stated responses. This analysis could be used to seed the larger sample and machine learning could be applied to learn likely behavioral results of the larger population. What’s particularly intriguing about this type of analysis is that this would also capture declining behavior (meaning the folks not wanting to express their opinion to a pollster) which would better inform and represent the overall population which would increase the accuracy of the prediction.
Here’s the bottom line. For more accurate predictions, we need to better represent the holistic human behavior and not simply just stated preferences. The broken voter model can begin to fixed if data science is used to its full potential.
Sign up for the free insideBIGDATA newsletter.