RE: Huge drivespace Re: computing budgeting (fwd)

Date view Thread view Subject view Author view

From: Eugene Leitl (eugene.leitl@lrz.uni-muenchen.de)
Date: Fri Sep 08 2000 - 00:45:59 PDT


Joseph S. Barrera III writes:
> Some questions to think about:
>
> How do you back up a 10 TB disk?
 
Why, on a yet another 10 TB disk. Or several of them. I don't use
tape. I backup HDs to HDs at home.
 
> If you forgot where you put something
> on a 10 TB disk, how long does it take
> to do a grep on the whole thing?
 
If locate is insufficient (I use it a lot), I will have to use a
full-text index. It can occupy 15% of total space, I don't care.

Global grep of mere 60 GBytes is nonviable even now.

> Let's say we're really optimistic
> and you can read your 10 TB disk
> at 100 MB/s. Then to read the whole
> thing, you need
>
> 10^7 MB / 10^2 MB/sec = 10^5 sec
>
> 10^5 sec / 3600 sec/hr = 27.8 hours
 
Great, 100 MByte/s stream for >24 h sounds very good for data
acquision.

> ... so it takes more than a day
> to do a linear scan of the disk.

So we have to index incrementally, so that no total reindex will be
necessary, only incorporation of diffs. And clearly, fsck is a no go,
so a journalling file system must be used.

> If you're grepping, or actually
> reading files from a file system,
> then it's going to take a lot longer.
 
Of course the real solution for this is parallelism. You can't expect
to treat huge masses of data in a purely sequential fashion. Rotating
bits will have to go.

> Jim Gray has a talk on his website
> that goes into more detail about
> the challenges of huge disks.
> If I have time, I'll track it down.
>
> - Joe


Date view Thread view Subject view Author view

This archive was generated by hypermail 2b29 : Fri Sep 08 2000 - 02:40:51 PDT