Assorted Links (Economics)

    1. Ad-valorem Platform Fees and Efficient Price Discrimination:  fees that increase in proportion to the sale price of the trades, enhance social welfare.
    2. USA DoD software development data over 2000 projects: a detailed review
    3. The Use of Natural Experiments in Merger Analysis: predicting the outcome of almost 23 of FTC merger challenge decisions
    4. Financial intermediaries and the Cross-Section of Asset Returns: broker-dealers as the wheels of financial markets
    5. Frédéric Bastiat’s What is Seen and What is Not Seen

Reverse Engineering Network Protocols

The black arts of reverse engineering network protocols have been lost. These days, every network protocol seems to be run over HTTP and handling lots of XML: every network engineer of the past decades would just cringe at the thought of it.

Complete specifications of network protocols like those offered in RFCs have always been luxuries: the product of idealistic minds of the past like Jon Postel, they only exist for the better known protocols of the Internet. For the rest, their details could only be known by reverse engineering: and the truth is that it requires a deep understanding of traditional software debugging, using tools like IDA and/or OllyDbg, specially for protocols of the binary kind.

Thus, the case of Skype: a recent decompilation of its binaries using Hex-Rays was publicly sold as a reverse engineering of the whole protocol suite. Nothing could be further from the truth.

Providing yourself with a kit of the best tools is the best path to success:

  • Sniffers are boring, read-only tools to see through the network layers. More fun can be had by crafting network packets, as recently simplified by tools like Ostinato and scapy
  • Another set of tools focus on decoding text-like protocols: reverx  (paper), and the impressive netzob
  • And the more interesting ones, tools that cross-overs between debuggers and sniffers: oSpy, an utility to sniff network application calls, and windbgshark, an extension to integrate wireshark within windbg to manipulate virtual machine network traffic

It’s said that in computer science, there’s only a sure way to find a research topic to write papers about: just add automatic to any problem statement, and a whole area of research is born! (aka. the meta-folk theorem of CS research). Most of the time the topic is obviously undecidable and a huge effort will be needed to produce tools of real practical value, but this doesn’t seem to stop researchers to produce interesting Proof-Of-Concepts. Reverse engineering being such a painstaking manual process, it’s a perfect target for this way of producing research, and very different methods and approaches have been tested: Smith-Waterman and Needleman-Wunsch algorithms from bioinformatics, with a recent open-source implementation combined with statistical techniques; automata algorithms to infer transitions between statesstatic binary analysis and runtime analysis of binaries because access to the runtime call stack is very convenient whenever using distributed computing contexts. Finally, a very interesting project was Discoverer @Discover@MSR: they announced very high success rates for very complex protocols (RPCCIFS/SMB), but the tools were never released,

Download (PDF29KB)

This post would not be complete without the mention of the best inspiration for every reverse engineer in the network field: SAMBA, the magnum opus of Andrew Tridgell, an open-source interoperability suite to let Linux and Windows computers talk together. A book about the protocol and the project, Implementing CIFS, is as good as any divulgation book can get: he makes it look so easy, even a child could do it.

The Politics of Network Protocols

One of the most important protocol switchovers was carried off 30 years ago: the ARPANET stopped using NCP (Network Control Protocol) to only use TCP/IP, as the righteous Jon Postel devised in The General Plan. NCP was a fully connection-oriented protocol more like the X.25 suite, designed to ensure reliability on a hop by hop basis. The switches in the middle of the network did have to keep track of packets, unlike the connectionless TCP/IP were error correction and flow control is handled at the edges of the network. That is, intelligence turned to the border of the network and packets of the same connection could be passed between separated networks with different configurations. Arguably, the release of an open-source protocol stack implementation under a permissive license (4.2BSD) was a key component of its success: code is always a better description than any protocol specification.

Yet TCP/IP was still incomplete: after the 1983 switchover, many computers started connecting to ARPANET, and bottlenecks due to congestion were common. Van Jacobson devised the Tahoe and Reno congestion-avoidance algorithm to lower data transfers and stop flooding the network with packets: it was quickly implemented on the TCP/IP stacks of the day, saving the Net to this day.

These changes were necessary, as they allowed the Internet to grow, on a global scale. Another set of changes as profound as those were, are now being discussed in the Secure Interdomain Routing mailing list: this time the culprit is the insecurity of BGP, as route announcements are not authenticated, and  the penance is enforcing a PKI into the currently distributed, decentralized and autonomous Internet routing system. Technical architectures force a predetermined model of control and governance, and this departure from the previously agreed customs and conventions of the Internet may simply be a bridge too far away, as always, in the name of security. And the current proposals may even impact Internet’s scalability, since the size of the required Resource Public Key Infrastructure may be too large for routers to handle, as the following paper from Verisign shows:

Download (PDF, Unknown)

On the other hand, this recent analysis shows that the design of the security of SBGP is of very high quality, a rare thing in the networking field, indeed:

 

Download (PDF909KB)

Empirical, Evidence-Based Software Engineering


These slides review the state of the art of Empirical, Evidence-based Software Engineering, a recent field of research whose goal is to get rid of fashionable practices, methodologies and clichés that are so sadly common in software development: more than a hundred papers are examined on a range of diverse topics such as complexity metrics, estimation, debugging, refactoring, agile methodologies, team organization, technical debt and software architecturing.

How the Future Dwells in the Past

Techno-utopians got it wrong: their tireless search for new technologies must start in the past. Most new technologies are just a rehash of past ones and most resources are devoted to maintaining existing technological infrastructure or towards incremental advances in old technologies: the new and innovative is extraordinarily rare. A fact so ignored but so intuitive, since most human needs have always been the same.

Mandelbrot’s “[amazon_link id=“0716711869” target=“_blank” ]The fractal geometry of nature[/amazon_link]” summarizes this line of thinking as the Lindy effect: the future survival of any Broadway show is best predicted by how long it has been running already. Itself based on a much older assertion that the “the future career expectation of a television comedian is proportional to his past exposure” (The New Republic, June 13th 1964). Thus, a statistical distribution that extends beyond the arts to other phenomena like the survival of technologies: the longer a technology has been in use, the longer we shall expect it to last, or more empirically, we shall conclude that every year that a technology survives may even double its additional life expectancy, contrary to the life expectancy of any living being. An insight that warns us against miracles when introducing new technologies without any precedent.

This next paper is the only one I could find that combines this effect with other power laws to try to ascertain the economic returns of basic research, and so the optimal level of investment:

Download (PDF, Unknown)

Assorted Links (Finance)

    1. Exploratory trading: explaining the use of small trades to test and prove the market
    2. Damodaran’s series on Acquisition: Winners & Losers, Big Deal or Good Deal?, Over-confident CEOs and Compliant Boards, Accretive (Dilutive) Deals can be Bad (Good) Deals
    3. Predictable Growth Decay in SaaS Companies: next year’s growth rate is likely to be 85% of this year’s growth rate.
    4. Making CrunchBase Computable with Wolfram|Alpha
    5. Does Academic Research Destroy Stock Return Predictability? Not as much as you think, publishing results only makes returns fall by a third.

On The Never-Ending Quest of Functional Programming

Functional programming has been in the top of the hype-cycle for decades: no side effects, methods written in a purely mathematical style, strong type systems… What’s not to like about functional programming? It’s even a loaded term! Just ask someone: “But is not your code functional , yet?”

Although the Turing machine (imperative) model of computation was shown to be equivalent to the λ‑calculus (functional), there’s actually a long gap in its asymptotics.

Start with the most simple of the data structures, the array: in a purely functional style of programming, to update an array you would need O(log N) time, but in an imperative language it’s just O(1). And this is not a specific problem with arrays, it’s the standard behaviour within pure functional programming languages due to the their overrated immutability of data: in “Pure versus Impure LISP”, it’s shown that O(N) functional programs written with lazy, non-strict evaluation semantics can be translated to a purely functional style in O(N log N). That is, at worst an O(log N) slowdown per operation may occur by simulating mutable memory access: immutability carries a very high cost, indeed.

In practice, extensions to the pure functional paradigm have been considered to address the need to directly reference and change the state of variables (monads in Haskell), and most algorithms can be implemented efficiently by using purely functional programming, but it also involves extensive and difficult knowledge (“[amazon_link id=“0521663504” target=“_blank” ]Purely Functional Data Structures[/amazon_link]”) for even the most basic of data structures, and the recognition  that they require a significant slowdown by constant factors and CPU stalling due to many cache misses.

And, as if that is not enough, debugging any functional program is a quest in itself: I would say that it’s an order of magnitude more difficult than an imperative language, vastly increasing software maintenance costs.