Not too long ago, I expressed some disbelief about the techno-utopianism that seems to surround discussions of big data.
Stowe Boyd, Re: The Future Impact Of Big Data
Unconstrained and dynamic complex systems — like our society, the economic system of Europe, or the Earth’s weather — are fundamentally unknowable: their progression from one state to another cannot be predicted consistently, even if you have a relatively good understanding of both the starting state and the present state, because the behavior of the system as a whole is an emergent property of the interconnections between the parts. And the parts are themselves made up of interconnected parts, and so on.
Yes, weather forecasting and other scientific domains have been benefited by better models and more data, and more data and bigger analysis approaches will increase the level of consistency for weather, but only to a certain extent. There are rounding errors that grow from the imprecision of measures and oversimplifications in our models, so that even something as potentially opaque as the weather — where no one is intentionally hiding data, or degrading it — cannot be predicted completely. In everyday life, this is why the weather forecast for the next few hours is several orders of magnitude better than the forecast for 10 days ahead. Big data — as currently conceived — may allow us to improve weather prediction for the next 10 days dramatically, but the inverse square law of predictability means that predictions about the weather 10 months ahead are unlikely to dramatically improve.
So, consider it this way: Big data is unlikely to increase the certainty about what is going to happen in anything but the nearest of near futures — in weather, politics, and buying behavior — because uncertainty and volatility grow along with the interconnectedness of human activities and institutions across the world. Big data is itself a factor in the increased interconnectedness of the world: as companies, governments, and individuals take advantage of insights gleaned from big data, we are making the world more tightly interconnected, and as a result (perhaps unintuitively) less predictable.
I saw a post today from Nassim Taleb, that echoes my points, from a more mathematical basis:
Nassim Taleb, Beware the Big Errors of ‘Big Data’
We’re more fooled by noise than ever before, and it’s because of a nasty phenomenon called “big data.” With big data, researchers have brought cherry-picking to an industrial level.
Modernity provides too many variables, but too little data per variable. So the spurious relationships grow much, much faster than real information.
In other words: Big data may mean more information, but it also means more false information.
Just like bankers who own a free option — where they make the profits and transfer losses to others – researchers have the ability to pick whatever statistics confirm their beliefs (or show good results) … and then ditch the rest.
Big-data researchers have the option to stop doing their research once they have the right result. In options language: The researcher gets the “upside” and truth gets the “downside.” It makes him antifragile, that is, capable of benefiting from complexity and uncertainty — and at the expense of others.
But beyond that, big data means anyone can find fake statistical relationships, since the spurious rises to the surface. This is because in large data sets, large deviations are vastly more attributable to variance (or noise) than to information (or signal). It’s a property of sampling: In real life there is no cherry-picking, but on the researcher’s computer, there is. Large deviations are likely to be bogus.
As data sets grow — especially data about complex, adaptive systems, like world economics, or human interactions — the proportion of noise grows as a function of the complexity.
Big data is not going to be some Hubble telescope peering into the heart of the universe. The world, alas, is not becoming more knowable.