linkless index

From: Thomas P. Copley (tcopley@best.com)
Date: Wed Mar 07 2001 - 11:37:24 PST


Hi,

I am new to the list. Actually, I've been lurking for a couple of
weeks now. I am kind of an itinerate webmaster working right now for a
non-profit toy info site <http://www.drtoy.com>. I've been wondering
whether or not there could be a business model based on creating a useful
linkless index for p2p searching? This article about Scoundrel gave me
the idea:

http://www.pigdog.org/auto/software_jihad/link/2021.html

Scoundrel: A New Concept for Searching P2P
By El Snatcher

One problem still haunts the P2P world: how the hell do you find
anything? Traditional search engines are impractical, because by the
time a P2P network has been spidered, the makeup of the network and
the content will probably have changed. Real-time keyword searches
are too slow. Yahoo-style index pages don't cut it. We need some
crazy new ideas. Enter Scoundrel...

Searching P2P networks sucks. With systems such as Napster and Gnutella
you put in keywords, or partial or full titles or artist names, and you
get a big list of crap back. There are always tons of redundant
results to sift through, and you have to decide which file to download.
Once you finally decide on what to snarf the file transfer bombs
out half way through, so you have to try again. This all assumes that
you know exactly what you're searching for. Usually there is no
cross-indexing.

Then there's the "catcher's catch can" problem. Because P2P nodes are
connecting and disconnecting to the network at unpredictable times,
the availability of resources is constantly in flux. You might be
able to find something one day, but the next day the same thing might
not be available. If you really want to find something, you may need
to search for it over and over again.

Some P2P systems barely have any way to search at all. To discover
what's on Freenet, for instance, you generally have to look through
big text files full of 'keys' (Freenet's version of hyperlinks).

So what can be done about this?

The Scoundrel Project has developed a system for automating the
process of searching, re-searching, and downloading things from P2P
networks. The core idea is to have what the author of Scoundrel calls
a "linkless index." This index, rather than being an index of what
actually IS on the network, is an index of what theoretically, or
ideally, SHOULD be on the network. It's "linkless" because the actual
index entries do not directly point to resources on the network.

The user can browse this index and select things that he or she would
like to retrieve, regardless of whether or not the files are actually
available on the network at that particular time. Later on, an agent
(bot, or whathaveyou) retrieves the files for that person. The agent
does all the shitwork of searching, re-searching, selecting the right
file, downloading, retrying, etc., all in the background, or while
they're away doing something fun. After all, that's what computers
should be doing -- tedious drudge work.

This idea has some interesting ramifications. Because the index is
disconnected from the actual files on the network, no checks need to
be performed to ensure that the index accurately reflects the contents
of the network. Thus the index can be more complete, be maintained
independently, and kept up-to-date more conveniently. And with a big,
comprehensive index, it will be easier to have good cross-referencing.

What's more, the user is free to browse the index at high speeds,
selecting things willy-nilly like a kid in a candy store, without
waiting for a search to complete, downloads to finish, or being
disappointed when a real-time search turns up an empty result
set. Everything is ultra-responsive, and it's a better user
experience.

Another implication of this strategy is that the network doesn't have
to be fast. Some P2P systems are 'blecherously' slow right now. This
will certainly change as the software develops and the systems get
more users, but right now the wait to get a file can be maddening.
But who cares if you're not doing all the waiting yourself?

There've also been strange and evil rumblings from the Digital
Millennium Copyright Act (DMCA) people about how innocent hyperlinks
to net resources containing copyrighted material will be considered
some kind of horrible copyright infringement themselves, punishable by
hanging and whatnot. This could put an ugly chill on the whole
Internet. A linkless index steps around this issue quite handily.

Although this "linkless index" strategy can be used for all kinds of
data, it is obviously well-suited for digital music trading
(e.g. MP3s), which seems to be the focus of many of the P2P projects
right now. So to build it's proof-of-concept application, the
Scoundrel Project decided to use an existing, highly-developed
database for it's linkless index: Amazon.com. Amazon's music index is
huge, cross-indexed, chock-full of user reviews, and has all sorts of
handy features which make it great for browsing music titles.

Here's how the Scoundrel program works: You fire it up and configure
it to know about several OpenNap servers -- the open source clone of
the Napster system. Currently, Scoundrel only works with OpenNap.
Next, you use Scoundrel's built-in Web browser to navigate Amazon's
music section. Scoundrel watches as you browse, and when you visit
the description page for a CD, Scoundrel automatically picks up the
title, artist, and track listings. You are given an opportunity to
review the list of stuff that Scoundrel has created, and to modify and
delete items. When you are ready to have Scoundrel go to work for
you, you hit the "get'em" button, and it crawls the various OpenNap
servers looking for MP3s of the music you want. Then you can minimize
Scoundrel and play some Nethack or whatever, or you can continue to
browse Amazon for even more goodies.

This works surprisingly well. As a test, I chose a few CDs from
Amazon's "Top Sellers" list, set Scoundrel loose, and went to
breakfast. When I came back there were at least two complete CDs in
MP3 form on my hard drive, and several partial CDs. And Scoundrel was
still out busting ass for me. Anyone who has ever spent all night on
Napster trying to put together an entire track list of MP3s knows how
cool that is.

Scoundrel isn't perfect, and neither is the linkless index idea. Even
though there are copious widgets and screens obstensively indicating
what the program is doing, it's hard to figure out. Sometimes it
seems to just hang, and sometimes it doesn't seem to search for all of
the things you tell it to. But after all, it is just a proof of
concept. The author calls it a "technology preview." And while a
comprehensive index of music files already exists, and there are other
databases for things such as movies (e.g. IMDB), how do we deal with P2P
resources that aren't already in a nice tidy index somewhere? And how
would you create an index for that stuff?

Another thing is that Scoundrel only runs on Windows. It's an open
source project under the GPL, but it's written in some sick language
like Delphi (which may be excusable considering that it's only a
prototype).

Despite these concerns, I give a big warm Beaujolais to the Scoundrel
Project!

There is one last intriguing thing to say about Scoundrel. The
author of the program is a mystery man. He remains anonymous to this
day. On March 1st, just after releasing the latest incarnation of
Scoundrel, he posted a message on the Scoundrel home page announcing
that he is abandoning all work on the project, and will never be heard
from again, although he hopes that others will continue work on the
project. This is from the Scoundrel web page:

"Well, so much for what Scoundrel has and has not done. As of today,
March 1st, 2001, I will no longer be able to continue development on
Scoundrel. I'll be disappearing from the face of the earth and will
not be reachable. I will not go into the reasons behind this."

Could it be that the big media companies got to him too? Is the RIAA
playing hardball behind the scenes? Will we ever know?

In the meantime, give Scoundrel a whirl.

###
-------------------------------------------------------------------
Thomas P. Copley E-mail: tcopley@best.com



This archive was generated by hypermail 2b29 : Fri Apr 27 2001 - 23:13:37 PDT