Anti-Trump demonstrators take to the streets in several U.S. cities

By Steve Lohr and Natasha Singer Nov. 10, 2016

It was a rough night for number crunchers. And for the faith that people in every field — business, politics, sports and academia — have increasingly placed in the power of data.

Donald Trump’s victory ran counter to almost every major forecast — undercutting the belief that analyzing reams of data can accurately predict events. Voters demonstrated how much predictive analytics, and election forecasting in particular, remains a young science: Some people may have been misled into thinking Hillary Clinton’s win was assured because some of the forecasts lacked context explaining potentially wide margins of error.

“It’s the overselling of precision,” said Dr. Pradeep Mutalik, a research scientist at the Yale Center for Medical Informatics, who had calculated that some of the vote models could be off by 15 percent to 20 percent.

Virtually all the major vote forecasters, including Nate Silver’s FiveThirtyEight site, The New York Times’ Upshot and the Princeton Election Consortium, put Clinton’s chances of winning in the 70 percent to 99 percent range.

The election prediction business is one small aspect of a far-reaching change across industries that have increasingly become obsessed with data, the value of it and the potential to mine it for cost-saving and profit-making insights. It is a behind-the-scenes technology that quietly drives everything from the ads that people see online to billion-dollar acquisition deals.

Examples stretch from Silicon Valley to the industrial heartland. Microsoft, for example, is paying $26 billion for LinkedIn largely for its database of personal profiles and business connections on more than 400 million people. General Electric, the nation’s largest manufacturer, is betting big that data-generating sensors and software can increase the efficiency and profitability of its jet engines and other machinery.

But data science is a technology advance with trade-offs. It can see things as never before, but also can be a blunt instrument, missing context and nuance. All kinds of companies and institutions use data quietly and behind the scenes to make predictions about human behavior. But only occasionally — as with Tuesday’s election results — do consumers get a glimpse of how these formulas work and the extent to which they can go wrong.

This week’s failed election predictions suggest that the rush to exploit data may have outstripped the ability to recognize its limits.

“State polls were off in a way that has not been seen in previous presidential election years,” said Sam Wang, a neuroscience professor at Princeton University who is a co-founder of the Princeton Election Consortium. He speculated that polls may have failed to capture Republican loyalists who initially vowed not to vote for Trump, but changed their minds in the voting booth.

Beyond election night, there are broader lessons that raise questions about the rush to embrace data-driven decision-making across the economy and society.

The danger, data experts say, lies in trusting the data analysis too much without grasping its limitations and the potentially flawed assumptions of the people who build predictive models.

The technology can be, and is, enormously useful. “But the key thing to understand is that data science is a tool that is not necessarily going to give you answers, but probabilities,” said Erik Brynjolfsson, a professor at the Sloan School of Management at the Massachusetts Institute of Technology.

Brynjolfsson said that people often do not understand that if the chance that something will happen is 70 percent, that means there is a 30 percent chance it will not occur. The election performance, he said, is “not really a shock to data science and statistics. It’s how it works.”

So, what happened with the election data and algorithms? The answer, it seems, is a combination of the shortcomings of polling, analysis and interpretation, perhaps both in how the numbers were presented and how they were understood by the public.

Silver, the founder of FiveThirtyEight, did not immediately respond to an email seeking comment. Amanda Cox, the editor of The Upshot, and Wang of the Princeton Election Consortium said state polling errors were largely to blame for the underestimates of Trump’s chances of winning.

In addition to the polling errors, data scientists said the inherent weakness of election models might have caused some forecasting errors. Before an election, forecasters use a combination of historical polls and recent polling data to predict a candidate’s chance of winning. Some may also factor in other variables, such as giving higher weight to a candidate who is an incumbent.

But even with decades of polls to analyze, it is difficult for forecasters to predict accurately a candidate’s chance of winning the presidency months or even weeks ahead of time. Mutalik of Yale compared election modeling to weather forecasting.

“Even with the best models, it is difficult to predict the weather more than 10 days out because there are so many small changes that can cause big changes,” Mutalik said. “In mathematics, this is known as chaos.”

In this presidential election, analysts said, the other big problem was that some state polls were wrong. Recent polls from Wisconsin, for instance, put Clinton well ahead of Trump. And election forecasts relied on that information for their predictions. Britain encountered similar lapses when polls mistakenly predicted that the nation would vote in June to stay in the European Union.

“If we could go back to the world of reporting being about the candidates and the parties and the issues at stake instead of the incessant coverage of every little blip in the polls, we would all be better off,” said Thomas E. Mann, an election expert at the Brookings Institution. “They are addictive, and it takes the eye off the ball.”

Tags
polls ,
election ,
data