Computer science is changing: the amount of data available for processing is growing exponentially, and so must the emphasis towards its handling. Like the 19th century change in physics from mechanics to statistical mechanics, the new algorithms sacrifice the precision of a unique answer for the fast search of statistical properties. The following draft of a book by Hopcroft and Kannan breaks the path of what most future algorithms manuals may look like:

Download (PDF, 1.83MB)

Heavy on proofs, many topics have been selected for their mathematical elegance, not their pragmatism. On the final version of this much anticipated book, I would love to see more content on hash algorithms, parallel algorithms, graph spanners or a more extensive discussion on Support Vector Machines.


2013 is coming to an end, and many papers have been published this year. My recommendations for the next holidays follow: deep, imaginative, mixing the practical and theoretical, and perfect to inspire hope for a better future.

  1. A Simple Algorithm for the Graph Minor Decomposition. A quadratic time algorithm for computing graph minor decompositions is presented, improving from the previous cubic lower bound. It’s very intuitive and easy to understand, so I bet it will appear on future books on algorithms, turning it into a classic.

  2. Dynamic graph connectivity in polylogarithmic worst case time. The classic problem of whether there is a path between two nodes on a dynamically changing graph has a poly-logarithmic algorithm: a theoretical advance with multiple nice applications in practice (routing, transportation, robotics…)

  3. Ambient backscatter: wireless communication out of thin air. Imagine devices communicating wirelessly using ambient radio-frequency as the only source of power, by backscattering ambient RF signals: the Internet of Things (IoT) must surely run over this in the future.

  4. Automatic patch generation learned from human-written patches: although the rate of successfully generated patches is still quite low, it’s interesting to learn that there are progresses on this frankly quite boring area of software maintenance.

  5. F10, A Fault-Tolerant Engineered Network: a novel way to successfully attack the commonplace problem of network reliability and performance in datacenters by re-examining all the involved parts: network topology, routing algorithm and failure detection.

  6. Pinocchio, Nearly Practical Verifiable Computation: amazing improvements in the generation of proofs of general computations and their verification, rendering into the practical this basic necessity of the next cloud computing landscape. Combine this with Pepper and you’re surely gaining knowledge of the most important advances of this year on the future of cloud computing.

  1. The library of George B. Dantzig is for sale: peering over the libraries of great mean is as enlightening as their work.
  2. Benford’s Law and the Art of Succeeding in Multiple Choice Tests: funny application of Benford’s Law.
  3. Polymath8: I’ve been following this project on collaborative mathematics and the article on Wired is a good description of its course.
  4. Rota’s conjecture proved: well, almost proved, there are some problems with the inequivalent representation of matroids, but it’s almost here.

After Edward Snowden’s revelations had been publically discussed for months, I can only claim: did nobody see this coming? Because there is a clear precedent: NSA’s Project SHAMROCK collected all telegraphic data entering or exiting the United States from 1945 to 1975. And more recently, the gargantuan scale of the Great Firewall of China was an omen of what more advanced countries could built to spy on its citizenship.

Privacy will be gradually eroded, being a weak private right and a strong public wrong whenever facing the presumption of illegalities, and prosecution laws are already being modified to be more accepting of evidence gathered through warrantless surveillance programs, especially in common-law countries where police forces have ample power to interpret the law. And if in the future all surveillance is perfectly automated with almost no human intervention, who would really care of the invasion of privacy if done by amoral and unsentient machines?

What I really find fascinating is that Snowden’s revelations haven’t brought us any advanced technology: it’s almost like the NSA didn’t have any real technology edge over current commercial technologies, which I don’t really buy into. Meanwhile, the private sector develops and markets some technologies like PredPol for real-time crime prediction: an excellent predictor of what’s to come.

  1. Statistical Basis for Predicting Technological Progress: excellent paper on how easy is to forecast technological progress for some selected technologies (the main problem now being the identification of those technologies).
  2. Amdahl’s Law vs. Gustafson-Barsis’ Law: it contains an interesting insight on the impact of parallelization in relation to power consumption.
  3. Moore’s Law in Photonics: mass diffusion of photonics is nearer than anyone else could think of.
  4. Predicting the Path of Technological Innovation: this paper introduces a simple model that works quite well across markets.
  5. The Economics of Long-Term Digital Storage: comparison and forecast of the different storage technologies.

Cloud Deployment Manager is the first and only tool to automagically deploy Microsoft clouds from a diagram, with just the click of a button: it represents the cutting edge of DevOps research and innovation on the Windows platform.

It’s amazing what can be achieved to improve how systems are currently architected with the right tools and some code.


I’ve just ended these three books published this year on the intersection of finance and programming:

  • C# for Financial Markets. This recently published book is just the translation to C# of all the previous books by the same author, especially the ones on the intersection between finance and C++. As such, one-third of the book delves into the implementation of basic mathematical finance (bonds, futures, options, swaptions, curve and surface modelling, finite difference method, …) and two-thirds of the book delves into teaching the C# language and its interoperability with Excel/LINQ/C++: note that if you’re already a pro on C#, you’ll better skip these parts since they are far from being the best authoritative source, although the sections on interoperability are really instructive. The best point of this book is that really full of examples to enlighten every concept (850 pages long!), although they never manage to compose a full application of any financial worth (that’s left as an exercise to the reader!): thus, and only on this technology angle, it’s the best book for beginners.
  • Financial Modelling – Theory, Implementation and Practice (MATLAB). Do you need a book to quickly acquaint yourself with the state-of-the-art in financial mathematics for asset allocations, hedging and derivatives pricing, skipping the study of dozens of papers? Then this book is your definitive shortcut: it encompasses all from the derivation of the models to their implementation in Matlab, demonstrating that this language can also be used for prototyping purposes in the financial industry, efficiency and interoperability aside (if you don’t know Matlab, a third of the book delves into that). My favourite part of the book it’s the first one on models (stochastic, jump-models, multi-dimensional, copulas) that reads lightly and fast in just an afternoon, but the book is also overfocused on the numerical implementation of the models (a third of the book), when most of these details are just left to some library in the real world. Even so, just running over all its examples is worth its full price.
  • Financial Risk Modelling and Portfolio Optimization with R. R is the lingua franca of statistical research, and its under-utilization in the financial industry it’s a real puzzle, the truth being that the sheer number of packages dealing with every imaginable statistical function should be enough to justify a much deeper penetration into daily use. This book is best suited for quantitative risk managers, and it surveys the latest techniques for modelling and measuring financial risk, besides portfolio optimisation techniques. Every topic follows a precisely defined structure: after a brief overview, an explanation is offered and a very interesting synopsis of R packages and their empirical applications ends the discussion. My favourite parts are at the end, on constructing optimal portfolios subject to risk constraints and tactical asset allocation based on variations of the Black-Litterman approach.
Disclaimer: I don’t hold stock in JW-A (John-Wiley & Sons Inc.), their selection is superb!

Charles Handy divided the management cultures in four types on his now classic book on business administration, Gods of Management:

  • Zeus culture, a club culture with centralized power in which speed of decisions and accountability are prioritized over any other virtue
  • Apollo culture, a role-based culture with many rules and procedures, in which stability and predictability are prioritized, creating resistance to change
  • Athena culture, a task-based culture focused on providing solutions to concrete problems, ideal in times of expansion when new products and technologies get introduced
  • Dionysus culture, an existential culture in which the organization exists for the sole purpose of helping the professionals to achieve their personal goals

In the last reorganization, Microsoft has shifted from an Athena-like culture to an Apollo-like culture, a typical change in business re-organizations also predicted by Handy in his book. However, the general consensus is that this move towards this kind of management culture is inferior, as shown in multiple studies:

  • Managerial overload and organization design: a divisional organization is more efficient than the functional organization when the number of projects is large, and Microsoft has 13 products and services with revenues of more than 1 billion dollar per year.
  • Organization Design: different management structures are compared based on the internal co-ordination costs imposed on the firm, very relevant to the Microsoft reorganization
  • A double moral hazard model of organization design: a model based on moral hazard to explain why sub-optimal business structures are chosen is introduced, starting from general fact that the divisional structure is more efficient than the functional structure.

Only the malleable nature of software and the command to deploy a common software platform on any device and any type of screen may outstrip the previously presented disadvantages: be that as it may, MSFT is pursuing a much-needed goal for the whole computer industry.

  1. Turing award goes to Goldwasser&Micali: zero-knowledge, semantic security and universal forgery under chosen message attack.
  2. The DDoS That Almost Broke the Internet: attacking whole Internet eXchanges just to bring down SpamHaus.
  3. Cyber-security: The digital arms trade: price quotes for exploits, The Economist’s style.
  4. NSA Cryptologs: was this really classified? Many opinions and little-to-none noteworthy facts.
  5. RC4 as used in TLS/SSL is broken: see page 306. More here.

Elasticity and scalability via dynamic provisioning of servers and other resources is one of the tenets of cloud computing: in practice, being able to automatically grow the resources available to an application based on the level of demand is actually eased with patterns like WASABi, the Autoscaling Application Block for Azure.

But even if an application can automatically grow, a question is still unanswered: what is the minimum number of servers that should be reserved? And that is a complex question that requires trading off the losses of being unable to properly service requests because booting up instances takes some minutes, and the actual costs of those instances that may rest idle with no load.

Let’s consider a service with low margin (10%) and a really intensive computation (8 requests per second per Azure XL server) were the probability that a customer does not enter the service due to congestion is 1% and a much higher probability of 2.5% for those leaving the service after a request: to model the whole system an M/M/s queue is considered, that is, s servers experiencing a Poisson-like input process and an exponential service time, with an unlimited waiting queue. To solve the proposed queue model, R code and its resulting plot as follows:

# Minimum number of roles to calculate
minRoles <- 4
# Maximum number of roles to calculate
maxRoles <- 12
# Probability of not entering service due to high-load
probNotJoin <- 0.01
lossNotJoin <- 0
# Probability of leaving the service due to high-load
probLeaving <- 0.025
lossLeaving <- 0
# Revenue per request
customerRevenue <- 1
# Profit per request
customerProfit <- 0.1
# Azure XL web/worker role ($0.96/hour)
costServer <- 0.96
# New requests/hour
arrivalRate <- 90000
# Requests/hour serviced per role
serviceRate <- 30000
serviceCost <- 0
totalCost <- 0

library(pdq, help)
for(r in minRoles:maxRoles) {
CreateMultiNode(r, "server", CEN, FCFS)
CreateOpen("requests", arrivalRate)
SetDemand("server", "requests", 1/serviceRate)

queueLength <- GetQueueLength("server", "requests", TRANS) - arrivalRate/serviceRate

waitingTime <- 60 * queueLength/arrivalRate
numNotJoin <- queueLength * probNotJoin * arrivalRate
numLeaving <- waitingTime * probLeaving * arrivalRate
serviceCost[r] <- r * costServer
lossNotJoin[r] <- numNotJoin * customerRevenue * customerProfit
lossLeaving[r] <- numLeaving * customerRevenue * customerProfit
totalCost[r] <- serviceCost[r] + lossNotJoin[r] + lossLeaving[r]

plot(totalCost, xlim=c(minRoles,maxRoles), type="b", col="black", lwd=2,
main="Optimal Elasticity of Demand", xlab="Servers (m)", ylab="Cost ($)")
lines(serviceCost, xlim=c(minRoles,maxRoles), type="b", col="brown", lty="dashed")
lines(lossNotJoin, xlim=c(minRoles,maxRoles), type="b", col="green", lty="dashed")
lines(lossLeaving, xlim=c(minRoles,maxRoles), type="b", col="orange", lty="dashed")

The black curve represents the total costs: the sum of the server costs (brown curve) and losses due to not entering the service (green curve) and leaving the service (orange curve). An optimal reservation of servers is found at s=8, the minimum of the total curve costs.

Set your Twitter account name in your settings to use the TwitterBar Section.