[FoRK] Joyent, cloud service evolution (J. Andrew Rogers)
Stephen D. Williams
sdw at lig.net
Wed Jun 22 01:27:06 PDT 2016
On 6/21/16 9:52 PM, Ken Meltsner wrote:
> The complicated part is when the huge stream of data can't be
> processed in smaller, independent streams. Counting the number of
> failed logins per second for some huge service, perhaps, if there's an
> attack going on using many different IDs and endpoint addresses at the
> same time. [Might not be the best example, but all I could come up
> with off the top of my head.]
Embarrassingly parallel, at least in the fan-in sense: You can communication concentrate the failed logins until the summary server
is receiving a relatively few sums from the next nodes in the tree. I know this works very well because, for example, Buddylist
was, through a network of communication concentrators, interacting with the GUIs of 800K out of 3M users at once from a single
process of a CPU that was maybe half the speed of one of the cores of your phone. It was all about communication concentrators in
the tree delivering bulk messages over TCP/IP. The transaction rate wasn't that high, but even that couldn't work with simple methods.
> Traditional databases for big concurrent systems like airline
> reservations worked extra hard not to sell the same seat twice on a
> given plane; I believe most modern ecommerce systems are "optimistic"
> -- assume enough widgets are available and apologize later if you run
> out for a couple of customers -- but this simplifies locking and
> significantly improves performance. Insisting on fully transactional,
> distributed locks made a two-server DB that I used to work on (not at
> the current employer) slower than a single server; for some
> applications, even three servers would still be slower than a single
> one because the distributed locks were so expensive.
This is also embarrassingly parallel: If you don't restrict yourself to a single expensive Oracle database (which you probably would
because Oracle licenses are (or at least were) very expensive), then you might realize that a plane can only have hundreds of
passengers. The tracking of the per-seat locking could be done in RAM, with transactions also logged to a database for reloading.
So, by experience, I know that a really slow CPU with a process limited to 1GB of RAM can handle 800k seats. Without anything
exotic, a small, cheap, and scalable set of commodity servers running Linux can easily handle the "don't sell a seat twice"
problem. You could run MySQL, Redis, custom code, etc. Divide out memory, transaction rate, and CPU load, allocate the resulting
number of servers, and distribute load over them.
> You can assume that the databases will be "eventually consistent" and
> get much better performance, especially if it's unlikely that multiple
> users will be competing for the same items; at the bleeding edge (not
> sure how much has been published), the databases proceed
> optimistically, but can back out of transactions that failed due to
> conflicts relatively long after those were committed.
There are a number ways you can avoid relying on races, classic ACID contention points, etc. The classic ACID requirements is for
updating a bank balance, updating a single record after a change. If you instead just record +/- transactions, you can always
compute the balance as of n seconds ago, where n is the max consistency lag. And you could write periodic sums to start with,
usable after n seconds. Banking already has the situation where transactions might be in flight outside of the system, so it
actually doesn't change things much.
> That's my limited understanding of the situation -- most of the
> methods JAR deprecates are pretty close to the state of the art for
> mid-range commercial products, so I'm curious how far the new stuff
> has gotten beyond the lab -- does something like
> https://www.cockroachlabs.com/ count as The Old Show or Next
> Ken Meltsner
> On Tue, Jun 21, 2016 at 5:50 PM, Stephen D. Williams <sdw at lig.net> wrote:
>> That's what I expect in a lot of cases: tiers of communications concentrators / condensers / filters, then parallel I/O channels to
>> some kind of storage.
>> In some cases, you may not even want to store this kind of data on secondary storage: fan it into high speed data channels into a
>> big, fast memory. Then do operations on that. As an image even. So you can use GPUs. Replicate it a few times for different
>> kinds of analysis.
>> On 6/21/16 2:46 PM, Marty Halvorson wrote:
>>> After reading all this, I wonder what you'll think of how the CERN LHC collects data?
>>> Each experiment generates 10's of trillions of bit's per second. A reduction of up to 2 orders of magnitude is done in hardware,
>>> the result is sent via very high speed pipes to a huge number of PC's where a further reduction of 1 or 2 magnitudes is done.
>>> Finally, the data is stored for research purposes usually done on collections of supercomputers.
>> FoRK mailing list
Stephen D. Williams sdw at lig.net stephendwilliams at gmail.com LinkedIn: http://sdw.st/in
V:650-450-UNIX (8649) V:866.SDW.UNIX V:703.371.9362 F:703.995.0407
AIM:sdw Skype:StephenDWilliams Yahoo:sdwlignet Resume: http://sdw.st/gres
Personal: http://sdw.st facebook.com/sdwlig twitter.com/scienteer
More information about the FoRK