[FoRK] gmail

Luis Villa louie at ximian.com
Thu Apr 1 21:34:35 PST 2004


On Thu, 2004-04-01 at 20:39 -0800, Gordon Mohr wrote:
> daniel grisinger wrote:
> > James Tauber wrote:
> > 
> >> So is it official yet that Google's GMail is *not* a joke?
> > 
> > 
> > looks like it's real.  http://www.gmail.com/ responds, and
> > forbes magazine is reporting that while the lunar jobs
> > were a joke, the new mail service is not.
> > 
> > how they are going to manage the data storage requirements
> > is beyond me.  every million users who hit their storage
> > limit represent a full petabyte of data, at internet scale
> > that is going to add up very quickly.
> 
> It's highly compressible text, and I betcha the 1GB is
> measured in uncompressed data.
> 
> They can also take advantage of message body redundancy.
> If we're both Gmail customers, and I send you a message,
> both my 'Sent' message and your 'Inbox' message can be
> a single copy. If I send a list message to 100 Gmail
> customers, more the savings.
> 
> And don't you think they're already dealing with such
> magnitudes of data with their web cache? It's been said
> Google has over 100,000 spinning commodity PCs as part
> of their operations. Even if each has only 250GB of disk,
> which seems plausible, they're already slinging 25
> petabytes or more.
> 
> At the Internet Archive, the current white-box servers
> have 4x300GB disks, for 1.2 terabytes per machine. As
> part of the IA "petabox" project [1], a 1U half-depth
> rackserver with 4 IDE drives is being developed. Hitachi
> this month began shipping 400GB IDE drives [2]. So a
> single 80-machine rack in the style suggested by the
> "Google Cluster Architecture" paper [3] could provide
> storage (if not full service) for over 100,000 Gmail
> users, even without compression or message-body sharing.
> 
> They've already got over a thousand similar racks, what's
> another thousand more to support a hundred million
> email accounts?

The google talk at Ottawa Linux Symposium 2003[1] was fascinating in
this respect. The speaker talked quite a bit about their hardware
setups. Their machines are in such bulk, and with such redundancy and
failover, that if a machine stops working, they not only don't repair
it, they just /leave it in the rack/. Its cheaper that way. So I don't
doubt that with a little data compression they could do this pretty
easily. The biggest cost is not hardware, it is floor space with a
reliable connection.

Luis

[1]sadly, no online slides I can find at this time



More information about the FoRK mailing list