Is Google really on a completely different plane? [was Re: [FoRK] languages]

Bill Stoddard bill
Fri Dec 9 06:35:25 PST 2005


J. Andrew Rogers wrote:
> On Dec 7, 2005, at 9:22 PM, Stephen D. Williams wrote:
> 
>> From my point of view, a lot of this seems to have to do with  
>> requiring filesystem or SQL ACID semantics vs. a usable but less  
>> difficult semantics.  GFS does the latter in a way that fits their  
>> processing model so it is implementable in reasonably efficient ways.
> 
> 
> 
> Yes, much of the magic is that they do not have to make the kinds of  
> strict guarantees a "proper" filesystem is supposed to be making.   One 
> can assure very high availability without communicating a  guarantee 
> back to the application, but having the filesystem  guarantee the 
> survivability (to some degree of certainty) of a  distributed filesystem 
> update along the lines of an fsync() is much  more expensive.
> 
> Google's applications all tend to be of a variety that do not need  
> (quasi-)deterministic guarantees routinely made by the filesystem, at  
> least not in the sense that they are forced to run distributed  
> transactions.  They write their own applications, so this is not a  
> problem.  Unfortunately, and often for good reason, many big storage  
> business applications are coded for filesystems with stricter semantics.
> 
> Which isn't to say that you cannot do a lot with this, and this is an  
> active if secondary line or research and experimentation for me.
> 
> 
> 
>> If you look at something like Lustre ( http://lustre.org ), they  are 
>> working on some of the magical distributed features, but they  have to 
>> implement coherent filesystem semantics so the cool  features are 
>> mostly TBD.
> 
> 
> 
> Something I have noticed is that common network filesystems are  almost 
> universally biased toward one of two assumptions:  low-latency  
> high-bandwidth networks (cluster filesystems), or high-latency low- 
> bandwidth networks (classic distributed network filesystem  
> architectures).  Neither of these models produces optimal results for  
> the types of uses and networks I have in mind.
> 
> What is lacking are filesystems explicitly designed for the  assumption 
> of high latency,  high edge bandwidth,

Andrew, that would be called a 'truck', filled with 'data storage media of your choice'. Very high latency, 
very high bandwidth :)

B



More information about the FoRK mailing list