Is Google really on a completely different plane? [was Re: [FoRK]
Fri Dec 9 06:35:25 PST 2005
J. Andrew Rogers wrote:
> On Dec 7, 2005, at 9:22 PM, Stephen D. Williams wrote:
>> From my point of view, a lot of this seems to have to do with
>> requiring filesystem or SQL ACID semantics vs. a usable but less
>> difficult semantics. GFS does the latter in a way that fits their
>> processing model so it is implementable in reasonably efficient ways.
> Yes, much of the magic is that they do not have to make the kinds of
> strict guarantees a "proper" filesystem is supposed to be making. One
> can assure very high availability without communicating a guarantee
> back to the application, but having the filesystem guarantee the
> survivability (to some degree of certainty) of a distributed filesystem
> update along the lines of an fsync() is much more expensive.
> Google's applications all tend to be of a variety that do not need
> (quasi-)deterministic guarantees routinely made by the filesystem, at
> least not in the sense that they are forced to run distributed
> transactions. They write their own applications, so this is not a
> problem. Unfortunately, and often for good reason, many big storage
> business applications are coded for filesystems with stricter semantics.
> Which isn't to say that you cannot do a lot with this, and this is an
> active if secondary line or research and experimentation for me.
>> If you look at something like Lustre ( http://lustre.org ), they are
>> working on some of the magical distributed features, but they have to
>> implement coherent filesystem semantics so the cool features are
>> mostly TBD.
> Something I have noticed is that common network filesystems are almost
> universally biased toward one of two assumptions: low-latency
> high-bandwidth networks (cluster filesystems), or high-latency low-
> bandwidth networks (classic distributed network filesystem
> architectures). Neither of these models produces optimal results for
> the types of uses and networks I have in mind.
> What is lacking are filesystems explicitly designed for the assumption
> of high latency, high edge bandwidth,
Andrew, that would be called a 'truck', filled with 'data storage media of your choice'. Very high latency,
very high bandwidth :)
More information about the FoRK