Unix Architecture Showing it's Age

by Ostatic Staff - May. 14, 2013

High Scalability has a fascinating article up that summarizes a talk by Robert Graham of Errata Security, summarizing the development choices needed to support 10 million concurrent connections on a single server. From a small data center perspective, the numbers he is talking about seem astronomical, but not unbelievable. With a new era of Internet connected devices dawning the time may have come to question the core architecture of Unix, and therefore Linux and BSD as well.

The core of the talk seems to be that the kernel is too inefficient in how it handles threads and packets to maintain the speed and scalability requirements for web scale computing. Graham recommends moving as much of the data processing as possible away from the kernel and into the application. This means writing device drivers, handling threading and multiple cores, and allocating memory yourself. Graham uses the example of scaling Apache to illustrate how depending on the operating system can actually slow the application when handling several thousand connections per second.

Why? Servers could not handle 10K concurrent connections because of O(n^2) algorithms used in the kernel.

Two basic problems in the kernel:

Connection = thread/process. As a packet came in it would walk down all 10K processes in the kernel to figure out which thread should handle the packet

Connections = select/poll (single thread). Same scalability problem. Each packet had to walk a list of sockets.

Solution: fix the kernel to make lookups in constant time

Threads now constant time context switch regardless of number of threads.

Came with a new scalable epoll()/IOCompletionPort constant time socket lookup.

The talk touches on a concept I’ve been mulling over for months, the inherent complexity of modern data centers. If you are virtualizing, and you probably are, for your application to get to the hardware there are most likely several layers of abstraction that need to be unpacked before the code it is trying to execute actually gets to the CPU, or the data is written to disk. Does virtualization actually solve the problem we have, or is it an approach built from spending far too long in the box? That Graham’s solution for building systems that scale for the next decade is to bypass the OS entirely and talk directly to the network and hardware tells me that we might be seeing the first slivers of dusk for the kernel’s useful life serving up web applications.

So what would come after Linux? It is possible that researchers in the UK have come up with a solution with Mirage. In a paper quoted on the High Scaleablity site the researchers describe Mirage:

Our prototype (dubbed Mirage) is unashamedly academic; it extends the Objective Caml language with storage extensions and a custom run-time to emit binaries that execute as a guest operating system under Xen.

Mirage is, as stated, very academic, and currently very alpha quality, but the idea is compelling. Writing applications that compile directly to a complete machine, something that runs independently without an operating system. Of course, the first objection that comes to mind is that this would lead to writing for specialized hardware, and would mean going back in time thirty years. However, combining a next generation language with a project like [Open Compute] would provide open specifications and community driven development at a low level, ideal for eking out as much performance as possible from the hardware.

No matter which way the industry turns to solve the upcoming challenges of an exploding Internet, the next ten years are sure to be a wild ride.