Static vs. Formal Methods for Code Auditing

Sure, it’s always feasible to write secure C code: for example, using the ternary operator (cond ? : x : y;) it’s possible to write statements so that errors are thrown whenever type mismatches are found at compile time, but it also would be very easy to accidentally bypass these software constructs. However, and for serious undertakings, it’s always better to rely on the existing tools that automate the many mind-numbing tasks that comprise a software audit.

Static code analysis features a long history (with the first lint tool dating back to 1977), but only the profusion of programmers from the last decade instructed in the paradigm of strongly-typed languages generated the expectation to get the same level of pre-execution bug detection for the weakly-typed languages. So strong is the impetus for static analysis in the C arena, that the whole Clang compiler suite  is replacing gcc just because is almost impossible to develop static analysis tools for it. Nowadays, in the hall of fame for static analysis code auditing software tools, Deputy gets the top prize for the open source/free program, and Coverity for the commercial one, with Veracode being a strong-contender.

Coverity is a commercial spin-off of the academic work of Dawson Engler, whose papers are a must to not only understand how their tools work, but also other cutting-edge research subjects like symbolic execution. Coverity Static Analysis generates a higher number of false positives than Veracode’s, especially given that they charge so much, but neither of the two surpass the huge number that gets generated by the venerable PC-Lint. But this last one, and others like PVS-Studio and Parasoft C/C++ are still a must if you prefer on-premise software rather than giving away to the cloud your precious source code. Or if no external tools are available, at least consider the code analysis tool in Visual Studio.

On a different category, Deputy is an innovative software project that tries to introduce rigorous type safety based on annotations that are included within the code.  I agree that it’s always a great idea to start introducing annotations whenever the codebase is in a nascent state, but it’s much more cumbersome whenever software projects are in an advanced state because it is akin to the tedious maintenance of code written by others. It also features runtime checks to detect those faults that couldn’t be detected at compile time. Or if having limited time available, you may consider just using splint instead of annotating everything.

At the end, I like to think that the most critical parts  of the code should be minimized and subjected to formal verification efforts (see L4.verified) but always noting the implicit circular paradox of having to write a specification for the formal verification process, which is a just a program in itself. Because, who verifies the verifier?

Assorted Links

    1. Crypto-coding like a Charm, in Python!
    2. Encyclopaedia of Windows Privilege Escalation
    3. Forward Secrecy for HTTPS
    4. Things you can learn while data-mining Facebook: the world is closer than we though, at 4.74 degrees of separation, and FB followers correlate with stock prices, but its userbase valuation fundamentals are still in question.
    5. Clayton Christensen on Disruption
    6. The Private and Social Costs of Patent Trolls

The Genesis of E‑mail

Every time an email is sent, it’s expected to be handled to its recipient, no matter what their service provider is. But in the first days, it wasn’t so simple: early commercial email services (CompuServe, Prodigy, Delphi, …) featured proprietary email services with no concept of an universal e‑mail address, which in turn created technical barriers that originated the commercial practice of settling charges for email delivery between service providers. In other words, there were interconnection agreements which detailed the delivery charges between providers, bounding the two parties to periodically settle their accounting differences, much like in the other telecommunication networks (telex, fax, teletex, SMS and phone termination fees).

But the number of said required agreements grew exponentially as the number of service providers expanded, and so did the technical difficulties to integrate their different email services. X.400 was born to solve these issues, implicitly providing support to keep settlement scores between carriers and their multi-interface delivery technology (vg. the preferredDeliveryMethod attribute). In the end, X.400 didn’t really take off and was substituted in 1990 by the much simpler X.500 protocol: but not due to its tremendous complexity, but rather due to the decisive move of service providers to stop settling accounts between them so they could just use X.500 to interconnect their directory services.

As usual, it’s almost never about technology, which can be better thought as the child of necessity and will. The hassle of reaching agreements was getting so high with the growing number of service providers that their diminishing return stopped justifying the related bargaining costs, which in turn were precluding the emergence of the essential network effects from the growing number of email users (as per Metcalfe-Beckstrom-Reed’s Laws): that is, they were the real limiting factor blocking the growth of the early Internet. Nowadays, the only trace of these agreements survives in transit traffic agreements, in turn solved by peering agreements.

To sum up, notice the circular paradox that the history of email established, a curious tale of unintended consequences: free email begot spam, and spam beget the obvious solution to start charging for email to put an end on it. Whether the trade-off was correctly solved depends on whom you talk to.

OpenFlow to Free the Closed Internet

The biggest paradox of the Internet is that, being the epitomeness of freedom and openness, its actual implementation is even more closed than the old mainframes. And despite the fact that the whole thing has always been properly documented in RFC memorandums, the oddness and peculiarities of the concrete implementations have always lied hidden within router images, even for the most important inter-domain routing protocols, the biggest concern during interoperability tests.

So it turns out that the real definition of openness is a very nuanced one: in the software world, licensing and governance are paramount, meanwhile standards and interoperability are crucial and strategic in the networking world.

Fortunately, OpenFlow is unlocking a new window of openness in this closed world:  its approach to Software-Defined Networking enables reprogrammable traffic management techniques in the Layer 2 much like MPLS does in Layer 3, but in much more heterogeneous environments. Its first version is very featureless, missing IPv6, QoS, traffic shaping and high-availability, and lacking a killer app, its general adoption will take time, if ever. Even so, its ability to complement recent virtualization technologies in the data center, and being the only practical way for researchers to try out new experimental protocols, makes it a key technology to watch for in the next years.

Trust (hypervisors), but (formally) verify them

Quis custodiet ipsos custodes?

Juvenal, (Satire VI, lines 347–8)


It’s been a while since I lectured about the formal construction of Java compilers and virtual machines. But nowadays, all the action has moved to an even more basic layer of the software infrastructure: the hypervisor, the core of the operating system virtual machine manager. But it’s just déjà-vu all over again, an instance of the eternal recurrence in the software world, since the first virtual machines originated at the operating system-level (IBM 360 Model 67). Even so, verification is still a key step to solve many Gordian knots, rooted in the same fashion of Thompson’s Reflections on Trusting Trust : from concurrency verification in Hyper‑V to the more advanced Typed Assembly Language and Proof-Carrying Code techniques.

dmr (1941 — 2011)

1
2
3
#include <stdio.h>
#include <stdlib.h>
#include <setjmp.h>
1
static jmp_buf <a href=“http://cm.bell-labs.com/who/dmr/” target=“_blank” rel=“noopener”>dmr</a>;
1
 
1
2
3
4
5
6
static void
function(void)
{
&nbsp;&nbsp;&nbsp;printf(“umquam\n”);
&nbsp;&nbsp;&nbsp;longjmp(<a href=“http://cm.bell-labs.com/who/dmr/” target=“_blank” rel=“noopener”>dmr</a>, 1);
&nbsp;&nbsp;&nbsp;printf(“meminisse\n”);
1
2
3
4
5
6
7
8
9
10
int
main(void) {
&nbsp;&nbsp;&nbsp;if (setjmp(<a href=“http://cm.bell-labs.com/who/dmr/” target=“_blank” rel=“noopener”>dmr</a>) == 0) {
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;printf(“Noli\n”);
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;function();
&nbsp;&nbsp;&nbsp;} else {
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;printf(“oblivisci\n”);
&nbsp;&nbsp;&nbsp;}
&nbsp;&nbsp;&nbsp;return EXIT_SUCCESS;
}

Charts: Koomey’s Law

As a follow-up to my previous post about basic computer engineering laws, this recent chart depicting Koomey’s Law, teaching us that for a fixed amount of computational power, the need for battery will fall by half every 1.6 years, or the other way round, the energy efficiency of computers doubles roughly every 18 months; a real breadth of fresh air to the current trend to conserve energy in computing systems.