Haystack, a personal information repository.

I Find Karma (adam@cs.caltech.edu)
Wed, 5 Mar 97 03:34:52 PST


They're in mid-project, but who isn't?

> The Haystack project is aimed at the individual customization end of
> these more realistic ``living'' information retrieval systems. We are
> interested in building on customizable substrates, such as those
> provided by Harvest or Content Routing, to create a community of
> individual but interacting ``haystacks'': personal information
> repositories which archive not only base content but also user-specific
> meta-information, enabling them to adapt to the particular needs of
> their users.

Sounds like a noble goal.

> We believe that such a system will let us address several questions:
> How can individuals use an information retrieval system to organize
> their own personal collection of information?


> How might an information retrieval system learn from its users and
> evolve over time into a more effective system?

> As individuals build up their own collections and information
> retrieval systems, how can they search for information that might be
> located in others' collections, especially when such information is
> organized by information retrieval systems that may differ greatly from
> their own?

Altavista search with host:xent.w3.org

> Our first step towards this goal has been to design a simple and
> convenient user interface to and annotation format for an information
> retrieval system. Our current annotations emphasize user-independent
> text meta-information, but the format for and structure of these
> annotations are intended to encompass hand-generated and automatic
> user-specific annotations. The annotations themselves are first-class
> documents in our system, so that, for example, search information can be
> reified and treated as an indexable object.

I like that annotations are first class objects. But where are the


> Given that individuals are organizing the information they care about,
> it is natural to ask how one user can benefit from the work of other
> users. Consider that the typical way to search for a paper book is to
> ask one's office-neighbor for it. Analogously, we would like to let
> individuals search for information in other people's haystacks. Both to
> limit the costs of a search and to improve the filtering of what is
> returned, it is important for the system to learn over time which other
> individuals are most likely to have information that a given user finds
> relevant---these haystack ``neighbors'' are the systems that should be
> queried first and whose results should be most trusted.

This is cool. Trust networks are right on the ball.

> Another opportunity that this linking of haystacks creates is in
> connecting individuals to other people who can address their information
> need. The information I have stored in my haystack is likely a good
> indicator of my knowledge and interests. A question that matches a lot
> of material in my haystack is likely to be a question I can usefully
> answer. The haystack system can therefore serve as an ``information
> brokerage'' connecting questioners to experts.

Much in the way that http://www.ffly.com/ isn't.

> Sharing haystacks also raises the issue of generalizing from
> individuals' customization of their own haystacks to larger (pooled)
> data-sets. This provides another opportunity to test the adaptability of
> query strategies and a test of the generalization of the underlying
> learning algorithms.

So let me get this straight. Not only is the axiom "Links are
knowledge" true, but also, "Queries are knowledge" is true too?

This work sounds decent, but I couldn't get their software to run at
Caltech, so for now, I'll take their word on it.


