[FoRK] [info] (highscalability.com) The Secret to 10 Million Concurrent Connections -The Kernel is the Problem, Not the Solution
sdw at lig.net
Mon May 20 19:57:16 PDT 2013
Turnkey implementation on Linux by Intel, with partners already
providing packet processing: Wind River, GWIND, Tieto. Unclear if one
of those solutions provides a full TCP/IP stack needed scalable app
handling of app traffic.
On 5/20/13 4:33 PM, Stephen Williams wrote:
> On 5/20/13 2:48 PM, J. Andrew Rogers wrote:
>> On May 20, 2013, at 1:38 AM, Stephen Williams <sdw at lig.net> wrote:
>>> As someone on page comments pointed out, this has been thought of
>>> and done before:
>>> I thought of doing it and talked about it here at some point.
>>> Somewhat similar to this is implementing NFS in a user-space
>>> library, which has been done several times apparently. Intel was
>>> part of a group that was going to do something supporting this with
>>> a number of device drivers, virtualizing them to processes, ACE or
>>> Let me know when they have something useful actually running.
>> Linux already supports exokernel-like application deployments. Most
>> well-engineered database engines, and some other server engines, are
>> built on top of what is essentially an exokernel model. However,
>> there are no databases in open source built like this that I can
>> think of; it requires a level of engineering sophistication usually
>> not found in open source software of this type. It also requires a
>> lot more upfront investment in code.
>> In these models, Linux acts as slightly more than a device driver and
>> resource accountant. The performance and robustness benefits are
>> quite large. The main drawback is that mixing exokernel applications
>> with normal applications on the same Linux environment can generate
>> unwanted side-effects since the models are not all that compatible.
>> This is relatively new. Five years ago Linux did not support
>> exokernel application models. Designing applications this way
>> requires a pretty high-end level of software engineering know-how
>> hence the rarity of code designed for this model.
> Sure, direct drive scatter / gather async request management has been
> around for Unix on SGI, Sun, etc. forever, used by Sybase & Oracle at
> least. The SGI server that ran the database for Buddylist did that
> (1995), with the somewhat broken side effect on SGI Unix that it had
> to poll to do it so the CPU was always at 100% for the database threads.
> Ideally, context switches are avoided and block traffic is exchanged
> between app threads and device drivers on dedicated threads with
> spinlock protected logical queues on heaps of blocks.
> If you cared, and weren't using SSD, it wouldn't be that difficult to
> do the same for storage. You'd just need a layout and logic that
> allows effective use of whatever block the drive decided to send you
> Networking is also theoretically not difficult; I just don't want to
> work on that kind of plumbing when I have much shinier and novel items
> on my queue.
More information about the FoRK