Hunting black swans with big data

Dec 6, 2012

In times of worldwide economic crisis, natural disasters, and political turmoil, one often wonders if such events couldn’t have been mitigated or avoided altogether with the help of the latest technological developments. The United Nations Development Programme offers a potential answer in the form of a software that allows better data visualization and detection of irregular patterns.

The web-data intelligence analysis company Recorded Future offers tools to scan more than 100,000 online resources according to set parameters, and then extracts, measures, and organizes the data in an attempt to single out early warning signs for these kinds of events.

The UNDP and Recorded Future recently tested the robustness of this intelligence software on the four-month period leading up to the 2008 Georgia-Russian conflict and discussed the process in a recent blog post.

As predicted, Georgia was mentioned with increasing frequency in online news as the critical moment approached. The qualitative analysis of the data, however, revealed several flaws in that only English-language sources were taken into account (leaving out the local media or Georgian-language microbloggers) and that the system picked up more stories closer to the present day.

As a side note, the idea behind the software is very reminiscent of—and in fact challenging for—Nassim Nicholas Taleb’s Black Swan theory, which describes an event that is, in his author’s words, an

“outlier, as it lies outside the realm of regular expectations, because nothing in the past can convincingly point to its possibility. Second, it carries an extreme impact. Third, in spite of its outlier status, human nature makes us concoct explanations for its occurrence after the fact, making it explainable and predictable”.

Naleb argues that such events are impossible to predict per se, so the only approach is to build robust models to mitigate the effects of the negative ones and to use the positive ones to their fullest potential.  As such, the predictive success of Recorded Future would imply a refutation of the Black Swan and could offer a magic tool turning black swans such as the rise of Internet and Black Friday into their easier-to-manage white counterparts.

Going deeper, the analysts looking at the data on the Georgia-Russia conflict constructed a model based on past knowledge: namely, that an increase in the mentions of violence in a discourse foretells upcoming violent events. The data disproved the assumption, an unsurprising fact in light of another aviary-named theory. To exemplify the fallacies of inductive reasoning, Bertrand Russell gave the example of a chicken fed every morning that rightfully assumes that nothing would disrupt the pattern-until the day the farmer kills it for his dinner table.

The researchers then refined their model by including the qualitative aspect of the data and testing if “message may provide insights about the speaker’s attitude to the status quo”. Specifically, they assumed that a message that “contains references to military and security issues (as opposed to tourism development), focuses on response and reaction to what others are saying, it is indicative of a speaker’s discontent with the status quo”. Although the data proved them right, their model was flawed by several glitches. Once again, the forward-looking statements appeared only in early 2009. Secondly, they increased in 2012, a natural pattern in the run-up to the parliamentary elections. And thirdly, as the researchers pointed out, a donor conference held at the end of 2008 skewed the data in favor of such statements.

Overall, Recorded Future proves an interesting and useful tool for a researcher dealing with a large amount of data, provided that the said researcher also has a keen “ornithological” eye that knows how to properly “see” black swans.

