1. Ad-valorem Platform Fees and Efficient Price Discrimination:  fees that increase in proportion to the sale price of the trades, enhance social welfare.
  2. USA DoD software development data over 2000 projects: a detailed review
  3. The Use of Natural Experiments in Merger Analysis: predicting the outcome of almost 2/3 of FTC merger challenge decisions
  4. Financial intermediaries and the Cross-Section of Asset Returns: broker-dealers as the wheels of financial markets
  5. Frédéric Bastiat’s What is Seen and What is Not Seen

The black arts of reverse engineering network protocols have been lost. These days, every network protocol seems to be run over HTTP and handling lots of XML: every network engineer of the past decades would just cringe at the thought of it.

Complete specifications of network protocols like those offered in RFCs have always been luxuries: the product of idealistic minds of the past like Jon Postel, they only exist for the better known protocols of the Internet. For the rest, their details could only be known by reverse engineering: and the truth is that it requires a deep understanding of traditional software debugging, using tools like IDA and/or OllyDbg, specially for protocols of the binary kind.

Thus, the case of Skype: a recent decompilation of its binaries using Hex-Rays was publicly sold as a reverse engineering of the whole protocol suite. Nothing could be further from the truth.

Providing yourself with a kit of the best tools is the best path to success:

  • Sniffers are boring, read-only tools to see through the network layers. More fun can be had by crafting network packets, as recently simplified by tools like Ostinato and scapy
  • Another set of tools focus on decoding text-like protocols: reverx  (paper), and the impressive netzob
  • And the more interesting ones, tools that cross-overs between debuggers and sniffers: oSpy, an utility to sniff network application calls, and windbgshark, an extension to integrate wireshark within windbg to manipulate virtual machine network traffic

It’s said that in computer science, there’s only a sure way to find a research topic to write papers about: just add automatic to any problem statement, and a whole area of research is born! (aka. the meta-folk theorem of CS research). Most of the time the topic is obviously undecidable and a huge effort will be needed to produce tools of real practical value, but this doesn’t seem to stop researchers to produce interesting Proof-Of-Concepts. Reverse engineering being such a painstaking manual process, it’s a perfect target for this way of producing research, and very different methods and approaches have been tested: Smith-Waterman and Needleman-Wunsch algorithms from bioinformatics, with a recent open-source implementation combined with statistical techniques; automata algorithms to infer transitions between statesstatic binary analysis and runtime analysis of binaries because access to the runtime call stack is very convenient whenever using distributed computing contexts. Finally, a very interesting project was Discoverer @Discover@MSR: they announced very high success rates for very complex protocols (RPC – CIFS/SMB), but the tools were never released,

Download (PDF, 18KB)

This post would not be complete without the mention of the best inspiration for every reverse engineer in the network field: SAMBA, the magnum opus of Andrew Tridgell, an open-source interoperability suite to let Linux and Windows computers talk together. A book about the protocol and the project, Implementing CIFS, is as good as any divulgation book can get: he makes it look so easy, even a child could do it.

  1. What will programming look like in 2020?
  2. RE2: Non-exponential regular expression evaluation
  3. Practical auto-threading compilers in C#
  4. Blaze: Numerical Big Data in Python
  5. Python source code of Statistics, Data Mining and Machine Learning in Astronomy

One of the most important protocol switchovers was carried off 30 years ago: the ARPANET stopped using NCP (Network Control Protocol) to only use TCP/IP, as the righteous Jon Postel devised in The General Plan. NCP was a fully connection-oriented protocol more like the X.25 suite, designed to ensure reliability on a hop by hop basis. The switches in the middle of the network did have to keep track of packets, unlike the connectionless TCP/IP were error correction and flow control is handled at the edges of the network. That is, intelligence turned to the border of the network and packets of the same connection could be passed between separated networks with different configurations. Arguably, the release of an open-source protocol stack implementation under a permissive license (4.2BSD) was a key component of its success: code is always a better description than any protocol specification.

Yet TCP/IP was still incomplete: after the 1983 switchover, many computers started connecting to ARPANET, and bottlenecks due to congestion were common. Van Jacobson devised the Tahoe and Reno congestion-avoidance algorithm to lower data transfers and stop flooding the network with packets: it was quickly implemented on the TCP/IP stacks of the day, saving the Net to this day.

These changes were necessary, as they allowed the Internet to grow, on a global scale. Another set of changes as profound as those were, are now being discussed in the Secure Interdomain Routing mailing list: this time the culprit is the insecurity of BGP, as route announcements are not authenticated, and  the penance is enforcing a PKI into the currently distributed, decentralized and autonomous Internet routing system. Technical architectures force a predetermined model of control and governance, and this departure from the previously agreed customs and conventions of the Internet may simply be a bridge too far away, as always, in the name of security. And the current proposals may even impact Internet’s scalability, since the size of the required Resource Public Key Infrastructure may be too large for routers to handle, as the following paper from Verisign shows:

Download (PDF, 189KB)

On the other hand, this recent analysis shows that the design of the security of SBGP is of very high quality, a rare thing in the networking field, indeed:


Download (PDF, 909KB)

  1. Latency Numbers Every Programmer Should Know
  2. Run LLVM Assembly in Your Browser
  3. Gource: Software Version Control Visualization
  4. Language <=> Language Matrix
  5. Machine Learning Course in R

These slides review the state of the art of Empirical, Evidence-based Software Engineering, a recent field of research whose goal is to get rid of fashionable practices, methodologies and clichés that are so sadly common in software development: more than a hundred papers are examined on a range of diverse topics such as complexity metrics, estimation, debugging, refactoring, agile methodologies, team organization, technical debt and software architecturing.

Set your Twitter account name in your settings to use the TwitterBar Section.