[FoRK] An interesting offshoot from the iPad discussion

J. Andrew Rogers andrew at ceruleansystems.com
Tue Feb 2 22:44:35 PST 2010

On Feb 1, 2010, at 12:35 PM, Bill Stoddard wrote:
> I'm still having a difficult time getting my head around doing 'enterprisey' (read 'profitable') things in the public cloud. Perhaps you have a really big pile of bits you'd like to do some analytics on... rent some cloud space, crunch some bits, collect results, rm -fr everything.   I just can't imagine persistently keeping bits that form the backbone of your business outside the firewall. Hit-n-run analytics maybe?

We need to redefine what a "cloud" is.

Let's start by having a cloud that can seamlessly scale a single system image across an arbitrary number of machines, not the current "I'm running WinXP on a hypervisor" fetish or the largely useless "I've distributed a simple hash table" fad.  The former is running a non-cloud in someone else's basement and the latter has so little value for analytics that I can't remember why I mentioned it.

I've tacitly identified why the cloud sucks above.

The obvious point is that you can't do anything in it that you can't do somewhere else; it competes on price and that is rarely a path to something recognizable as "success". Analytics is driving a lot of the growth of large-scale data infrastructures and sharded models like MapReduce are largely useless for almost all analytics anyone would actually care about. We have "big data" but what we really need to make clouds useful is "big analytics".  The software du jour is in dire need of a computer science overhaul in that regard.

The subtle point is that even if we did have "big analytics" in a real cloud, the value of a semi-public cloud is that it would be horrendously expensive to backhaul the myriad quasi-realtime data sets that will be required in the near future in order for the analytics to be valuable. Having every third-rate or even first-rate company replicate their own version of exabytes of reality is a non-starter. Grossly inefficient and politically implausible. If analytic processes could seamlessly bridge the public-private sphere, it would be vastly more efficient than trying to suck the universe into a laughably tiny rack of servers.  Designing a protocol that allows this will require semantics and protocols that are richer and more clever than what passes for a "service" today, but not that much more. Calling back to the previous paragraph, that would require better algorithms and data structures and little more.

The economics of the "cloud" as currently defined don't pan out.  The economics of a true cloud are so strong that once implemented most everyone will be sucked in whether they like it or not.

More information about the FoRK mailing list