[FoRK] Joyent, cloud service evolution (J. Andrew Rogers)

J. Andrew Rogers andrew at jarbox.org
Wed Jun 22 01:10:16 PDT 2016


> On Jun 21, 2016, at 11:46 AM, Marty Halvorson <marty at halvorson.us> wrote:
> 
> After reading all this, I wonder what you'll think of how the CERN LHC collects data?
> 
> Each experiment generates 10's of trillions of bit's per second.  A reduction of up to 2 orders of magnitude is done in hardware, the result is sent via very high speed pipes to a huge number of PC's where a further reduction of 1 or 2 magnitudes is done.  Finally, the data is stored for research purposes usually done on collections of supercomputers.


LHC has the disadvantage of being old in computer hardware terms, though I assume they are constantly upgrading it. However, scaling this kind of pipelined, sequential data reduction is relatively simple because there is no concurrency or computation between streams, particularly since the real analysis happens on other systems at a later time.

In 2016, a decent top-of-rack switch can move more than a terabit per second all by itself. In terms of throughput, you can drive a few petabytes per day — parsing, processing, indexing, storage, etc -- through a single rack of inexpensive servers. This exceeds the current average requirements of LHC I believe (peak may be higher). The big limitation is storage density. If you are doing real-time operational analysis (LHC is not), you’ll only fit around a petabyte in a single rack if you are using cheap storage. For most workloads, assuming your software is decent, you’ll be completely bandwidth bound. I assume LHC is compute-bound due to the unique nature of the processing it does; the throughput is pretty low for the quantity of hardware they use. 

Of course, LHC doesn’t have an SLA that requires executing ad hoc queries against the entire data volume while the real-time ingestion is occurring, returning results that reflect all ingested data with only milliseconds or seconds of latency. Almost all IoT systems at this scale have an SLA like that, and most large-scale data platforms can’t support it. It is purely a software limitation, the hardware can certainly support it for most workloads.





More information about the FoRK mailing list