[FoRK] disk space costs less than bandwidth, and both cost less than

Eugen Leitl eugen at leitl.org
Tue Oct 26 14:02:03 PDT 2010

On Tue, Oct 26, 2010 at 03:38:07PM -0400, Kragen Javier Sitaker wrote:
> Pricewatch.com currently (2010-10-26) lists a 2TB drive (“Seagate
> ST32000542AS Seagate Barracuda LP ST32000542AS 2TB 5900 RPM 32MB Cache
> SATA 3.0Gb/s”) for US$120 with free shipping in the US, and that
> appears to be a typical price.  US$120 for two terabytes is
> US$7.5 × 10⁻¹² per bit.

I tend to regard disks, especially consumer disks, as unreliable, 
semi-discardable medium, requiring burn-in, redundant assemblies
capable of tolerating 2-3 nonrecoverable device errors during 
resilver/rebuild with periodically scheduled diagnostics and 
recovery, using redundancy and checksums allowing reconstruction
of lost or mutated data.

Nearline drives are 2-3x the cost of the cheapest consumer drives. 
Of these about half capacity is left at that redundancy level. 
That would require a correction factor of 4-5 for your numbers.
> I pay AR$100 per month for my internet connection here in Argentina;
> last I checked, I could download stuff from abroad over it at 31
> kilobytes per second, although this varies considerably.  AR$100 is
> about US$25, so if I were downloading constantly at an average of 31

I'm paying around 30-40 EUR/month for a 6/100 MBit/s DOCSIS 3.0
cable modem domestically. You probably pay 2-3x of that for
GBit/s at the colo (about 6 EUR/TByte, I believe). 10 GBit/s 
and more is available, but this is no longer consumer price range.

> kilobytes per second, I would be paying US$3.8 × 10⁻¹¹ per bit.  In
> practice, I don’t download at full speed 24/7, not least because the
> latency on the poorly-configured cable modem goes to hell, so I
> actually pay more for this.

I found investing in a little QoS (pfSense) is well worth here.
> The interesting point about the above is that, for me, downloading
> some piece of data costs about five times more than buying disk space
> to store it.  If I bought that 2TB drive, it would take me 24 months

The situation is different for me, especially keeping disks and
according hosting system (rackmount, 0.25 EUR/kWh) in operation 
24/7/365, plus paying for hosting. 

> of constant full-speed downloading to fill it, which would cost
> US$600.

I could be, in theory, downloading about 320 k eBooks at the 
moment, which in total are supposed to take almost 4 TByte (4x 2 TByte or
8x 1 TByte nearline drives). I've heard of libraries which
contain about a million volumes, or 3x that much space.
> The Amazon “Swindle” (so-called because even after you buy it, Amazon

I found AMOLED displays very useful in practice, I presume I could
be quite happy with an iPad or an Android tablet (e.g. 10" Samsung
Galaxy). At the moment, I would probably go with the iPad as a 
reader and vademecum, though Android is getting stronger and
stronger as a contender.

> still controls it) and similar devices have removed the need to
> consume US$4 worth of paper (and US$40 or so worth of laser printer
> time, at least at the rates charged around here) to read the book
> comfortably, at least if you read substantially more than 30 books.
> (One downside of this is that Amazon, since they still control the
> device, can send your books to the memory hole if it decides it
> doesn’t like them, as they famously did with copies of _1984_.  For
> the time being, they probably can’t do the same with copies on your
> hard disk.)

Presumably, we can see bad things with DRM using TPM, which ties
accession of your data to a particular device, maybe with key escrow.
> The curious inversion that I’m in, where it costs more to fill the
> disk than to buy it, has not yet reached much of the US, and will take
> even longer to reach Japan and Korea.  However, it has already reached
> much of the world, and there’s no reason to expect the exponential
> growth lines to fail to cross everywhere the way they’ve already
> crossed here.  Disks continue to halve their cost per bit every 15

I don't think the scaling law is going to continue for much longer.
The density is already nearing physical limits (give an order of
magnitude or a bit more), and the number of platters is also limited
(nevermind resilver times and error rates are already prohibitively
high). This will only change with 3d molecular storage, which is a
disruptive technology, but not anywhere marketplace, or even in
the lab yet.

> months, while internet bandwidth continues to halve its cost per bit
> every 4 years or so.

I don't see why we wouldn't have domestic GBit or 10 GBit pretty soon.
> All of those together only add up to 74GB.  I don't know of any place

FLAC librares are pretty large. A Blu-Ray rip will be easily 30 GByte. 
Purportedly, some people have multimedia libraries with hundreds of
HD movies. 3D content is coming. Higher resolutions are coming. 

> to download two terabytes of data.

I do. Oh, I do. Now a couple PBytes would be admittedly a problem
(Wait! No, I can fill PBytes up very easily. Volumetric data from
neural tissue scans).
> Possible consequences
> ---------------------
> The rapidly falling price of disk storage --- and the more slowly
> falling price of network bandwidth --- seems likely to have some
> interesting effects in the coming years.  
> First, perhaps the market for bigger and bigger disks will collapse,

Currently, we have the reverse situation. After a considerable hiatus,
finally 3 TByte drives are there, and 4 TBytes are closely behind.
However, these are consumer drives, which cannot be used in redundant

> since most people don’t generate enough data locally to fill their

You can easily hook up drives via eSATA, 1-10 GBit Ethernet
or those 40 GBit-1 TBit/s cheap optical networking Intel is
promising, using good old sneakernet. Obviously, the limit 
is the ~150 MByte/s speed of single spindles.

> disks, or they do so only with the expectation of being able to share
> it over the internet with their friends and family and beyond.  We’re
> already seeing this to some extent as many computers have switched
> entirely to SSDs and no longer use spinning disks.

Some computers have switched to SSDs, mostly only system drives
and notebooks (where speed, battery life and shock immunity are
useful). For anything much over 100 GByte the hard drive still
has no alternative options. Currently, the best filesystems allow
you to use hybrid device pools, utilizing the special properties
of solid-state to hide the deficiences of hard drives.
> Second, perhaps secondary means of transferring data will gain more
> importance.  LAN parties, local wireless networks, and physically
> shipping disks from one place to another may become more widely used,

We routinely get packs of some 10x 2 TByte disks at work.
The hassle is customs, which insists that we ship the drives
back, not believing that we're interested in data on them, not
the drives themselves (which are basically just handy data
storage cartridges, only not nearly as robust as e.g. DLT tape).

> as it becomes comparatively more difficult to copy around
> high-resolution digital photographs, amateur movies, crawls of the
> entire World-Wide Web, and so on.
> Third, perhaps deletion of files will become less important --- and
> less easy in the user interface.  Certain kinds of files, such as the
> aforementioned high-resolution digital photographs, will still need to
> be deleted because they weren’t interesting enough to share.  But old
> versions of text documents, software, copies of Uncle Tom’s Cabin?
> Delete only for privacy and security reasons.

I no longer delete private data. Typically, I by now have a mess
of multiple copies of data spread over different systems and
filesystems. It's all somewhere out there, honest. And usually
I can even find it, thanks to locate & friends.
> Fourth, perhaps disks will be normally sold pre-filled with files ---
> movies, books, snapshots of Wikipedia, massive quantities of free
> software, and so on.

Don't see it happen, unless it's free content.
> Fifth, perhaps software to tell when you already have a file on your
> disk, and can thus avoid downloading it, will become more important.

The Genesis Library calls the book scan file names by the md5 hash
of the content. It's easy to see how to access them via
http://localhost:af/fe/c0/ff/ee etc., spreading them over directory
trees with only a few files in each hierarchy, naturally taking
care of collisions (of course you should still check by comparing
the files bit by bit, and altering the scan in some trivial fashion
(by altering a single pixel) in the case of those astronomically
rare collisions.

> Content-based naming schemes like the ones used in Git and BitTorrent
> could facilitate this enormously.  In some cases, these can be used to
> find when other computers physically near you have the files as well.
> (BitTorrent is a good example of this, although it has some trouble

A project worth tracking is http://tahoe-lafs.org/trac/tahoe-lafs

> with NAT.)

Well, the IPv6 thing is finally started to happening now. My colo
now offers it natively, and technically it's part of the DOCSIS 3.0
spec. NAT at ISP level is much too expensive at ISP scale, especially
if mandatory data retention legislation hits in your area.
> Sixth, perhaps software will become much more aggressive about using
> local disk to avoid downloading stuff over the network.

I see the opposite trend so far.
> Seventh, an increasing range of material would ideally be downloaded
> optimistically (“prefetched”), especially when the connection is idle.
> 21 seconds of my time waiting costs on the order of US$0.70; 21
> seconds of use of my internet connection costs US$0.0002.  So even if
> I only ever read one out of every 3500 things that was optimistically
> downloaded, I’m still better off.  Even at a much lower time
> opportunity cost, reading 1% of the prefetched text would make it a
> better deal.

Eugen* Leitl <a href="http://leitl.org">leitl</a> http://leitl.org
ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org
8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE

----- End forwarded message -----
Eugen* Leitl <a href="http://leitl.org">leitl</a> http://leitl.org
ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org
8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE

More information about the FoRK mailing list