[FoRK] [info] (highscalability.com) The Secret to 10 Million Concurrent Connections -The Kernel is the Problem, Not the Solution

Stephen Williams sdw at lig.net
Mon May 20 19:57:16 PDT 2013

Turnkey implementation on Linux by Intel, with partners already 
providing packet processing: Wind River, GWIND, Tieto.  Unclear if one 
of those solutions provides a full TCP/IP stack needed scalable app 
handling of app traffic.

Intel DPDK


On 5/20/13 4:33 PM, Stephen Williams wrote:
> On 5/20/13 2:48 PM, J. Andrew Rogers wrote:
>> On May 20, 2013, at 1:38 AM, Stephen Williams <sdw at lig.net> wrote:
>>> As someone on page comments pointed out, this has been thought of 
>>> and done before:
>>> http://pdos.csail.mit.edu/exo.html
>>> I thought of doing it and talked about it here at some point. 
>>> Somewhat similar to this is implementing NFS in a user-space 
>>> library, which has been done several times apparently.  Intel was 
>>> part of a group that was going to do something supporting this with 
>>> a number of device drivers, virtualizing them to processes, ACE or 
>>> similar.
>>> Let me know when they have something useful actually running.
>> Linux already supports exokernel-like application deployments. Most 
>> well-engineered database engines, and some other server engines, are 
>> built on top of what is essentially an exokernel model. However, 
>> there are no databases in open source built like this that I can 
>> think of; it requires a level of engineering sophistication usually 
>> not found in open source software of this type. It also requires a 
>> lot more upfront investment in code.
>> In these models, Linux acts as slightly more than a device driver and 
>> resource accountant. The performance and robustness benefits are 
>> quite large. The main drawback is that mixing exokernel applications 
>> with normal applications on the same Linux environment can generate 
>> unwanted side-effects since the models are not all that compatible.
>> This is relatively new. Five years ago Linux did not support 
>> exokernel application models. Designing applications this way 
>> requires a pretty high-end level of software engineering know-how 
>> hence the rarity of code designed for this model.
> Sure, direct drive scatter / gather async request management has been 
> around for Unix on SGI, Sun, etc. forever, used by Sybase & Oracle at 
> least.  The SGI server that ran the database for Buddylist did that 
> (1995), with the somewhat broken side effect on SGI Unix that it had 
> to poll to do it so the CPU was always at 100% for the database threads.
> Ideally, context switches are avoided and block traffic is exchanged 
> between app threads and device drivers on dedicated threads with 
> spinlock protected logical queues on heaps of blocks.
> If you cared, and weren't using SSD, it wouldn't be that difficult to 
> do the same for storage.  You'd just need a layout and logic that 
> allows effective use of whatever block the drive decided to send you 
> next.
> Networking is also theoretically not difficult; I just don't want to 
> work on that kind of plumbing when I have much shinier and novel items 
> on my queue.
> sdw
> _

More information about the FoRK mailing list