The right to be let alone

is indeed

the beginning of all Freedom

Justice William O. Douglas

One the many curiosities about privacy is that there is no written record of old laws left, a fact that could be wrongly interpreted as if it were some improper outgrowth of modern times and not one of the fundamental human rights. This confusion is easily solved when it’s remarked that it was anything but the proliferation of modern mass media, by the late 19th century, that actually propelled the claim for its legal recognition, a quest of very unsuccessful results: that is, the need for privacy is one of many technological wrongs, always increasing with new media-related technologies.

As evidence, it was the visionary paper “The Right to Privacy” by Justice Louis D. Brandeis and Samuel D. Warren published at the Harvard Law Review in 1890 that started the doctrine of the invasion of privacy and mostly settled its current definition: unsurprisingly, it was written as a reaction to a new technology, the photographic camera.

In modern times, the ever-falling costs of computer storage and sensors allow the affordable recording of the full life of an human being, as the experiment MyLifeBits@Microsoft Research did show: but in this case, every piece of information is registered under informed consent and it‘s not as correlated with the information of other people as those found in social-network databases. Its implications are of a more socio-psychological significance, as it strives to redefine human memory, so frail and self-deceiving.

The other side of the coin emerges whenever the tons of unknowingly collected data are used with malicious intents, as shown in the following documentary, Erasing David: escaping from the past has gotten as difficult as escaping from the piles of accumulated data by both governments and private companies.

Anonymity, as a good, is getting scarcer by the moment, and as such, much more pricier to achieve. Disappearing, vanishing without a trace, is the luxury item of our times.


Apache Hadoop, a rework by Yahoo of the Google File System and MapReduce, has become the lingua franca of the Big Data movement. By the trail of Google’s success, the MapReduce paradigm manages to reach a successful balance between a reasonable developer learning curve, scalability and fault-tolerance for storing and querying very large datasets.

But the true history and community efforts behind Hadoop is much more complex: Google left behind the constraints of GFS and MapReduce long time ago to more efficient and capable technologies. And so did other open-source projects: Hive and Pig to carry BI/analytics queries in a low-latency fashion, matching Google’s BigQuery; HBase and Storm for real-time search and incremental indexing, substituting the Percolator engine at Google and the BigTable store; and Giraph, for carrying out large-scale graph-processing computations with significant speedups, like the Pregel framework at Google.

So, when talking about Hadoop, you have to distinguish between the core Hadoop stack (HDFS, ZooKeeper) and the growing number of projects surrounding this core. Many companies are creating Hadoop distributions (Cloudera, HortonWorks, MapR), capitalizing on the growing need for commercial support by integrating a subset of those projects in an easy-to-install package. Except that they will never include anything innovative nor experimental, the main reason behind their existence just being support and not cutting-edge research and development: groundbreaking developments will always happen outside them, since support costs always grow higher with new lines of code than their initial development cost.

Here, the parallel with Linux ecosystem and distributions is clear, since fragmentation between distributions will severely break interoperability in the very same way: currently, no one has proposed an equivalent for the Hadoop ecosystem to the Linux Foundation, the entity that proposes the Linux Standard Base and the Filesystem Hierarchy Standard. But then again, the incentive between the Hadoop distributions is to create lock-in, which should increase fragmentation, not the other way around.

Hadoop on Azure is just another distribution, and the comparative advantage of using this or any other distribution is null if they don’t include the full list of revolutionary projects that extend the core Hadoop stack to enable the most innovative ventures. It’s not about the Azure’s price of the underlying storage or computation. The key thing is to develop and openly provide all the wrapper and glue-code that now is individually created by every user of these projects: that will be a real challenge to Microsoft Open Technologies.

No matter how hard times could get: in hindsight, everything is forgotten and hope replaces every glimpse of prudent rationality. But reading some carefully selected books is the perfect antidote to get back some good’n’old common sense:

  • Debt, the First 5000 Years. A history of debt through the different cultures and civilizations. Although dichotomous and highly controversial in its moral judgments, it outstandly debunks myths like the prime role of money over the debt or the dual nature of debt as an instrument of commerce and finance, and perfectly portrays the cult of personal honor as the root of the economy through the ages. You better skip most of the narrative and go directly to the more academic sources cited in the references.
  • Manias, Panics and Crashes: A History of Financial Crisis. A reference work, revered by its insights and the lasting impact of its anecdotes. Entirely literary and qualitative, it was the first to illustrate that crisis do follow predefined patterns by the carefully picked descriptions of past debacles, though it lacks a general theory of their formation and development.
  • This Time Is Different. It’s a wonderful masterpiece of the cliometric school, born to the power of the personal computer to carry out hundreds of regressions: contrary to the previous book, it offers a quantitative study of financial crisis over centuries and continents, a view far away from the traditional equilibrium models of the economy. Frequentist and predictive by its nature, it fails at ignoring that crisis may have roots different to a failure in the saving-to-investment mechanism that it forcibly ascribes to, even if the first 200 pages are dedicated to a fully detailed taxonomy of financial crisis.
