[FoRK] Multicore, async segmented sequential models, Re: outsourcing back end

J. Andrew Rogers andrew at jarbox.org
Wed May 8 11:29:09 PDT 2013


On May 8, 2013, at 2:15 AM, Stephen D. Williams <sdw at lig.net> wrote:
> 
> Node and Nginx get it right in a lot of ways.  Ideally, the kernel would package multiple requests and IO events on multiple connections into a message stream that could be processed in far fewer IOs, but otherwise, the model is not bad.  (AOL essentially had communications concentrators at the edges that turned all communication into packets in the message flow.  Packets were transported as streams of bytes on TCP/IP connections.  So if messages were 60 or 200 bytes (easily possible given the app styles), many could fit in a single 30K TCP/IP buffer read.)


I think it is possible to hack something approximately like this for network I/O on Linux, though I could be wrong. For all other I/O and related events it can be done efficiently using the native APIs.

A single core can saturate 10 GbE for many things these days. The way I've usually seen it done is that a single thread handles the network I/O, batching up and offloading any "heavy" work to other dedicated cores which only see packed message buffers. 

The asymmetry of core specialization means you never see all of the cores running at 100% but the measured throughput is often much higher than when you try to balance load to achieve 100% utilization on all cores. 


> Message passing should be done with atomic, non-blocking, spinlock or similar statistically lock-free methods.


Ironically, lock-free methods do not scale because they use locks. You still need to intentionally minimize the number of message passing operations for best performance. If your performance noise floor is quite low, the cost of these methods is noticeable.


> Obviously, you want to optimize the number of threads.  Sometimes this is one per core, sometimes 2 per core, depending on what it takes to get full utilization.


SMT implementations like HyperThreads are much better than they used to be. Coincidentally, the kinds of optimizations enabled by the thread-per-core async model tend to do a much better job of saturating the execution ports in the core which renders SMT less effective.


> GUI systems, and async segmented sequential systems like Node and nginx, handle the scheduling for you for the most part.


Yes, but these are trivial schedules because there is little interaction between events. For as much as this async scheduled model has become popular, few programmers seem to grok scheduling unless it is coincidentally trivial. Robustness and throughput are complex tradeoffs and designing a system that will not spontaneously descend into a pathological equilibrium, or to even analyze systems for potential equilibrium transitions, can be a challenge.

The scheduler is where you deal with all of the greedy routing problems, Nash equilibria, etc. Even for simple systems, a pure async model that deals with network I/O, disk I/O, request handling, etc can quickly start to look like a non-trivial distributed system from a scheduling standpoint. In for a penny, in for a pound.


Tangent:

This suggests an attack vector on software systems designed around flawed, non-trivial schedulers by constructing a load that triggers a transition to a pathological equilibrium. It would be analogous to the attacks based on differential cryptanalysis of hash functions built into programming languages (e.g. http://emboss.github.io/blog/2012/12/14/breaking-murmur-hash-flooding-dos-reloaded/)






More information about the FoRK mailing list