It’s all over the news: a vulnerability has been found on OpenSSL that leaks memory contents on server and clients. Named Heartbleed, it has a very simple patch and some informative posts have already been written about it (Troy Hunt, Matthew Green).

What nobody is saying is that the real root cause is the lack of modern memory management in the C language: OpenSSL added a wrapper around malloc() to manage memory in a more secure and efficient way, effectively bypassing some improvements that have been made in this area during a decade; specifically, it tries to improve the reuse of allocated memory by avoiding to free() it. Now enter Heartbleed: by a very simple bug (intentional or not), the attacker is able to retrieve chosen memory areas. What was the real use of that layer?

Face it: it’s a no-win situation. No matter how many ways these layers are going to be written, there will always be a chance for error. You can’t have secure code in C.

But re-writing and/or throwing away thousands of security related programs written in C is no-brainer: the only way to securely run these programs is with the help of some memory debuggers techniques, like those used by Insure++ or Rational Purify. For example, the next technical report contains a detailed analysis of some of these techniques that prevent these kind of vulnerabilities:

Download (PDF, 1.99MB)

 

A graphical summary to Caspers Jones’ latest book, “The Technical and Social History of Software Engineering”, aggregating the data of thousands of projects:

caspers1

  • Note how application size is lowering in terms of number of lines of code, in direct correlation to the linear increase in the expressive power of programming languages. This observation fits well the growing number of web/mobile application that only do a very limited number of functions.

caspers2

  • The maximum percentage of code reuse is growing very fast, due to a higher number of libraries and open-source, but spotting projects with a 85% of reuse is a yet a rarity.
  • Defect removal efficiency has steadily improved, but I expected a steeper line due to static analysis and better compiler warnings
  • The percentage of personal dedicated to maintenance has surpassed that of the initial development, but there’s little research on the success factors of this stage.

caspers3

As languages improved (and their number, so more languages are available for specific tasks), so did the programmer’s productivity, lowering the defect potential at the same time: this document about software engineering laws also provides another interesting outlook of the same datasets.

 

brainI wonder how does this fit with another partially-related publication, The Experience of Mathematical Beauty and its Neural Correlates: could well-structured, clean code with idioms and patterns produce the same neural experience of beauty?

 

Cloud Deployment Manager is the first and only tool to automagically deploy Microsoft clouds from a diagram, with just the click of a button: it represents the cutting edge of DevOps research and innovation on the Windows platform.

It’s amazing what can be achieved to improve how systems are currently architected with the right tools and some code.

 

I’ve just ended these three books published this year on the intersection of finance and programming:

  • C# for Financial Markets. This recently published book is just the translation to C# of all the previous books by the same author, especially the ones on the intersection between finance and C++. As such, one-third of the book delves into the implementation of basic mathematical finance (bonds, futures, options, swaptions, curve and surface modelling, finite difference method, …) and two-thirds of the book delves into teaching the C# language and its interoperability with Excel/LINQ/C++: note that if you’re already a pro on C#, you’ll better skip these parts since they are far from being the best authoritative source, although the sections on interoperability are really instructive. The best point of this book is that really full of examples to enlighten every concept (850 pages long!), although they never manage to compose a full application of any financial worth (that’s left as an exercise to the reader!): thus, and only on this technology angle, it’s the best book for beginners.
  • Financial Modelling – Theory, Implementation and Practice (MATLAB). Do you need a book to quickly acquaint yourself with the state-of-the-art in financial mathematics for asset allocations, hedging and derivatives pricing, skipping the study of dozens of papers? Then this book is your definitive shortcut: it encompasses all from the derivation of the models to their implementation in Matlab, demonstrating that this language can also be used for prototyping purposes in the financial industry, efficiency and interoperability aside (if you don’t know Matlab, a third of the book delves into that). My favourite part of the book it’s the first one on models (stochastic, jump-models, multi-dimensional, copulas) that reads lightly and fast in just an afternoon, but the book is also overfocused on the numerical implementation of the models (a third of the book), when most of these details are just left to some library in the real world. Even so, just running over all its examples is worth its full price.
  • Financial Risk Modelling and Portfolio Optimization with R. R is the lingua franca of statistical research, and its under-utilization in the financial industry it’s a real puzzle, the truth being that the sheer number of packages dealing with every imaginable statistical function should be enough to justify a much deeper penetration into daily use. This book is best suited for quantitative risk managers, and it surveys the latest techniques for modelling and measuring financial risk, besides portfolio optimisation techniques. Every topic follows a precisely defined structure: after a brief overview, an explanation is offered and a very interesting synopsis of R packages and their empirical applications ends the discussion. My favourite parts are at the end, on constructing optimal portfolios subject to risk constraints and tactical asset allocation based on variations of the Black-Litterman approach.
Disclaimer: I don’t hold stock in JW-A (John-Wiley & Sons Inc.), their selection is superb!
 

The smallest of all seeds,

becomes the largest of garden plants:

it grows into a tree,

and birds come and make nests in its branches.

The Net is full of open-source code, and it seems that nobody cares about its preservation. If some old programs were to be preserved for the future, what would you choose? My legal choices are as follows:

 

The classic era of the analysis of algorithms exhibited limitations: worst-case running analysis and upper bound limits were not enough to compare algorithms in practice, neither to predict the real performance of algorithms. Although it enabled a fruitful research era in the design of algorithms, it had to be ignored by most practitioners. Neither lower bounds (Omega notation), nor theta notation (order of growth) were of much help: the field was in need of a profound paradigm shift.

Analytic combinatorics was the catalyst, a calculus built on the power of generating functions to enable the creation of predictive models for algorithm performance by using precise quantitative predictions of large combinatorial structures.

The book of Flajolet and Sedgewick, two key researchers that shaped the field with their fundamental theorem of symbolic combinatorics, is the best, self-contained source for learning (freely available on the web):

Download (PDF, 11.58MB)

You may also want to receive some lectures from Sedgewick himself, a privilege only available if you go to Princeton (or by using the Coursera online learning platform):

 

There are more than 6000 languages in the world, though most distributions about languages are power laws: for example, word occurrencelanguage family size and language usage. In effect, only less than 100 living languages are used for written, and many of them do not even have a written form: many wrongly claim that languages are endangered, ignoring that their number is a function of population, and with a growing human population, their number will only grow.

The parallels between natural and computer languages are striking, even though their origins and purposes are so different.

In computer science, there are more than 4000 computer languages, and growing (note that there are only one million people who know how to program): the easiness by how parsers and DSLs can be created can only contribute to this growing trend. And the distribution of their use reveal a similar power law: the truth being that only a small subset of languages is being used in production systems, the rest being academic exercises. Note that their ranking is very volatile (TIOBE index) compared to natural languages, with largely isolated and fragmented communities matching the effect that territories have on natural languages.

Although some subtle differences between natural and computer languages may explain their large number in proportion to their smaller supporting population: computer languages may maintain their usefulness beyond the hardware that supported them stops working, a common occurrence within the world of COBOL.

How many natural languages do we need? Six, if you were to ask Victor Ginsburgh and Shlomo Weber. And that is also a pretty reasonable number for computer languages: after examining their calculation and analysis, I can only conclude that learning a number larger than this is a clear sign of being over-educated (I’m guilty as charged).

 
  1. What will programming look like in 2020?
  2. RE2: Non-exponential regular expression evaluation
  3. Practical auto-threading compilers in C#
  4. Blaze: Numerical Big Data in Python
  5. Python source code of Statistics, Data Mining and Machine Learning in Astronomy
 
  1. Latency Numbers Every Programmer Should Know
  2. Run LLVM Assembly in Your Browser
  3. Gource: Software Version Control Visualization
  4. Language <=> Language Matrix
  5. Machine Learning Course in R
 


These slides review the state of the art of Empirical, Evidence-based Software Engineering, a recent field of research whose goal is to get rid of fashionable practices, methodologies and clichés that are so sadly common in software development: more than a hundred papers are examined on a range of diverse topics such as complexity metrics, estimation, debugging, refactoring, agile methodologies, team organization, technical debt and software architecturing.

 
Set your Twitter account name in your settings to use the TwitterBar Section.