In a talk given at the Royal Society on the origin of life, John Maynard Smith noted that while the 19th century had been the century of energy, in which science and engineering were concerned with transformning energy from one form to another (chemical to mechanical as in a steam engine or mechanical to elecrical as in a dynamo), the 20th century was about information, and in particular the transformation of information. In biology we have learned that our genes are written in a genetic code, that is being transmitted, translated and transcribed, and high-energy physics is to a large extent focused on interpreting the massive amounts of information that is being produced in particle accelerators. Today information technology is a major industry, and it is possible to make a fortune on simply transforming one form of information into another more useful shape.
The latter activity is the topic of the book Big Data: a revolution that will transform how we live, work and think by Viktor Mayer-Schönberger and Kenneth Cukier. In it they explore the consequences of our ever increasing ability to gather, store and process data, and focus in particular on the implications it has for business. They show how IT-giants such as Google and Amazon have gotten ahead in the game by relying on the power of data, and have developed clever ways of acquiring and utilising it. For example Google has built a spell-checking algorithm from all the billions of misspelled search queries that they have amassed. In a similar way Google have also constructed a translation algorithm based on millions of webpages that happen to exist in multiple languages.
The data is messy, but the shear volume overcomes the problems. This represents, they authors claim, the opposite to the traditional approach to acquiring knowledge, i.e. careful data acquisition and analysis, studying only a subset of the totality of data. Big data is taking all possible information into account, or N = all, in the words of the authors.
A more creative and surprising (at least to me) example of big data, which highlights the potential of data transformation, is the ability to predict local economic growth and unemployment figures from analysing geo-location data from gps devices. The book is full of such examples, which although interesting become slightly tedious after a while. More importantly perhaps they made me conscious of all the different ways in which we give away our personal information in exchange for "free services" offered by big data companies. The company performing the above mentioned predictions gathers its data from a "free" gps app.
For the single user, Twitter represents a way to communicate and connect in a rapid and free manner, but to the company the tweets represent datafied moods and feelings of millions of people that are updated every instant. Twitter thus has direct and quantifiable access to millions of peoples minds in real-time. It is therefore no surprise that data from Twitter can be used in order to predict everything from box office sales to election results.
The book is mainly aimed at business people, but still touches on the implications for science and society at large. One recurring topic is that we in the future will move away from causation and rely more on correlation in our attempts to understand the world. This might be so, but, as the authors rightly point out, we still need theory to place the data and the conclusions drawn from it in a framework of understanding.
Despite its brevity (just under 200 pages excluding references and notes) it is a bit repetitive, and a few factual errors also detract from its appeal (no, Steve Jobs did not survive longer because he had his genome sequenced, experiments are not often complicated and unethical, and yes it was possible to determine ones position prior to gps-technology, using for example a chronometer and sextant.)
In any case I would recommend the book to those who are curious about how information gathering and analysis is changing our society and how business is done, but wouldn't recommend to those that are hoping for a more scientific or philosophical view on Big Data.