[FoRK] Grid Computing + Web Services

J. Andrew Rogers < andrew at ceruleansystems.com > on > Mon Oct 30 22:20:23 PST 2006

On Oct 30, 2006, at 6:56 PM, Stephen D. Williams wrote:
> All auctions on a single instance?  What stops each auction from  
> being on its own server?  Each auction would reasonably have a  
> single instance, but you could partition auctions among servers in  
> any way that is convenient.  They may have a scalability problem  
> with a single auction that is extremely popular, but since they use  
> communication concentrators (web servers that run application  
> client code), and auctions are very simple things, I know they  
> could handle tens of thousands of transactions per second on a  
> reasonable machine.


Tens of thousands of transactions per second?  Either you are talking  
about some very exotic hardware or really meant *transactions per  
minute*, which is also commonly used (e.g. TPC).  Current typical  
four core server hardware will retire around 500 transactions per  
second sustained with a well engineered app.

By "single instance" I did not mean a database, I meant that you do  
not have a distributed write load for single objects (e.g. a single  
auction).  In effect, the auction synchronizes on a single physical  
row rather than multiple live copies of the same row.  Distributed  
instances can be done, but only for availability/durability because  
it does bad things to transaction throughput.


> Search indexes at eBay even near the beginning were internally  
> cached and refreshed periodically, on the order of minutes whenever  
> I checked it.


That makes sense, since strict consistency in the search/index  
servers would put a lot of extra load on the auction servers.  I did  
not remember them being very strictly consistent, which seems like a  
very economical constraint to lose considering that it does not  
materially affect app functionality.


>> There are two cases, and relatively common ones at that, where it  
>> gets ugly:
>>
>> - Data domains that do not have a trivial or "nice" decomposition  
>> or partitioning
>> - Applications that require strict consistency guarantees of  
>> various types from end-to-end
> At a high level, these are true.  The more I look at it, the more I  
> see that A) often mistakes have been made in data/semantic  
> architecture and B) there are often multiple ways to meet  
> consistency guarantees.


I don't disagree.  Probably the single best way to distribute loads  
in a consistent way is to do implement something sort of like a  
distributed multi-versioning protocol where A) any single user always  
gets a consistent state though not necessarily the same as other  
users, and B) the system at large can guarantee that a globally  
consistent state is eventually attainable.  However, these  
implementations get hairy once you start talking about really massive  
systems, reliable data, and availability guarantees.


> Yes, that's a tough one, but fairly exclusive to financial market  
> systems, don't you think?  Still, you have described a push only  
> system which is fairly easy to replicate and coordinate so that  
> queries can be processed over striped servers.


Financial markets?  Not even close.  More like meteorological data,  
syndicated news, infectious disease data, etc.  All in near real- 
time.  There are a huge number of applications that we *could* build  
in theory that take advantage of vast, rich data sets lying around if  
there was an infrastructure that was capable of supporting it.  For  
many very good reasons, strict consistency and durability guarantees  
are required -- bad things could happen otherwise.  We use a lot of  
this type of data now for non-critical and/or non-real-time uses, but  
that completely ignores the arguably more important market for the  
same data when it is critical and real-time.

These are "push" type systems in a sense, but with millions of  
subscribers with very complex constraints on what they actually see  
and no trivial way of partitioning those constraints without a lot of  
seemingly unnecessary and expensive brute force.  It could also be  
framed as "pull" depending on how you want to look at it.

Cheers,

J. Andrew Rogers


More information about the FoRK mailing list