[FoRK] Joyent, cloud service evolution (J. Andrew Rogers)

Gregory Alan Bolcer greg at bolcer.org
Wed Jun 22 10:20:27 PDT 2016


Data comes in all shapes and sizes.  One of the best breakdowns I've seen.
http://labs.sogeti.com/three-data-scientists-share-six-insights-big-data-analytics/

As for small databases?  Go big or go home. ;-)

On Tue, Jun 21, 2016 at 9:52 PM, Ken Meltsner <meltsner at alum.mit.edu> wrote:

> The complicated part is when the huge stream of data can't be
> processed in smaller, independent streams.  Counting the number of
> failed logins per second for some huge service, perhaps, if there's an
> attack going on using many different IDs and endpoint addresses at the
> same time.  [Might not be the best example, but all I could come up
> with off the top of my head.]
>
> Traditional databases for big concurrent systems like airline
> reservations worked extra hard not to sell the same seat twice on a
> given plane; I believe most modern ecommerce systems are "optimistic"
> -- assume enough widgets are available and apologize later if you run
> out for a couple of customers -- but this simplifies locking and
> significantly improves performance.  Insisting on fully transactional,
> distributed locks made a two-server DB that I used to work on (not at
> the current employer) slower than a single server; for some
> applications, even three servers would still be slower than a single
> one because the distributed locks were so expensive.
>
> You can assume that the databases will be "eventually consistent" and
> get much better performance, especially if it's unlikely that multiple
> users will be competing for the same items; at the bleeding edge (not
> sure how much has been published), the databases proceed
> optimistically, but can back out of transactions that failed due to
> conflicts relatively long after those were committed.
>
> That's my limited understanding of the situation -- most of the
> methods JAR deprecates are pretty close to the state of the art for
> mid-range commercial products, so I'm curious how far the new stuff
> has gotten beyond the lab -- does something like
> https://www.cockroachlabs.com/ count as The Old Show or Next
> Generation?
>
> Ken Meltsner
>
> On Tue, Jun 21, 2016 at 5:50 PM, Stephen D. Williams <sdw at lig.net> wrote:
> > That's what I expect in a lot of cases: tiers of communications
> concentrators / condensers / filters, then parallel I/O channels to
> > some kind of storage.
> >
> > In some cases, you may not even want to store this kind of data on
> secondary storage: fan it into high speed data channels into a
> > big, fast memory.  Then do operations on that.  As an image even.  So
> you can use GPUs.  Replicate it a few times for different
> > kinds of analysis.
> >
> > sdw
> >
> > On 6/21/16 2:46 PM, Marty Halvorson wrote:
> >> After reading all this, I wonder what you'll think of how the CERN LHC
> collects data?
> >>
> >> Each experiment generates 10's of trillions of bit's per second.  A
> reduction of up to 2 orders of magnitude is done in hardware,
> >> the result is sent via very high speed pipes to a huge number of PC's
> where a further reduction of 1 or 2 magnitudes is done.
> >> Finally, the data is stored for research purposes usually done on
> collections of supercomputers.
> >>
> >> Thanks,
> >>
> >> Marty
> >
> >
> >
> > _______________________________________________
> > FoRK mailing list
> > http://xent.com/mailman/listinfo/fork
>
>
>
> --
> After 30+ years of email, I have used up my supply of clever ,sig material.
> _______________________________________________
> FoRK mailing list
> http://xent.com/mailman/listinfo/fork
>



-- 
greg at bolcer.org, http://bolcer.org, c: +1.714.928.5476


More information about the FoRK mailing list