Re: Real-time distributed Web search (Gnutella knockoff?)

Date view Thread view Subject view Author view

From: Lucas Gonze (lucas@gonze.com)
Date: Tue Jun 27 2000 - 17:41:59 PDT


If it's bad data fewer users will connect to their node.
Standard capitalism is the solution, not technology.

Nicolas Popp wrote:
>
> Somebody pointed out to me that this approach was doomed.
>
> If you give destination sites control of the search query, many will lie.
> Most commercial sites want to hijack traffic from search engine and appear
> very high in a results list whether or not their site is actually relevant
> to the query. That's a real world fact and that's actually why meta tags
> have failed (and ignored by search engines these days).
>
> In other words, if you give control of the query to the Web sites that want
> to be found and lose control of the relevance ranking, you will return
> garbage...
>
> -Nico
>
> -----Original Message-----
> From: Rohit Khare
> To: FoRK@xent.com
> Sent: 00/06/27 4:56
> Subject: Real-time distributed Web search (Gnutella knockoff?)
>
> I'm not really thrilled, but it is interesting to see that both
> cachers (Dynamai) and searchers (here, Infrasearch) are beginning to
> cope with dynamcially-generated content. That said, the real promise
> of real-time indexing is real-time notification (iPal, anyone?) --
> Rohit
>
> gag: http://elitist.xcfventures.com/window.jpg
>
> briefly
>
> InfraSearch is 100% Gnutella. The only thing not Gnutella is your web
> browser.
>
> The fully-distributed system comprises several major components, each
> of which can be made redundant and load-balanced with extreme ease
> through Gnutella technology.
>
> It's entirely pay-as-you-go. No huge up-front expenditures. This
> entire prototype runs on a Pentium III machine costing only a few
> thousand dollars.
>
> We don't believe a million-dollar search server is a good use of your
> resources, so InfraSearch is designed to run on hardware that fits on
> your child's credit card.
>
> briefly
>
> Just launched your online store? Congratulations: You've created
> another black hole on the web.
>
> Fortunately, there's InfraSearch. It unlocks the door for users
> searching for what you offer. It lets them peer into your dynamic
> content.
>
> InfraSearch is not just another way to crawl and index. InfraSearch
> changes search from a passive thing into an active thing. It changes
> search from an HTML thing into a semantic thing. Best of all, it
> changes search into a thing you, the information provider, control.
>
> Search as infrastructure. Only InfraSearch does it.
>
> InfraSearch Search engines
> search your database Yes No
> allows you to respond to queries the way you want Yes No
> allows you to respond competitively Yes No
> search method Whatever you choose Crawling
> search data Semantic HTML
> who controls the search You do They do
>
> InfraSearch can...
>
> InfraSearch enables information providers to answer searches. After
> all, who knows how to answer a question about news better than a news
> specialist? Who can answer a question about red roses better than a
> florist?
>
> So that's the short of it: Let the people who know the answers answer
> the way they want. Let the user experience begin at the search engine.
>
> search as infrastructure
>
> It's a strange idea to throw up a web site and hope the search
> crawlers come around and index it. It doesn't make sense to leave
> that to chance.
>
> What makes sense is taking charge and driving traffic by controlling
> the way searches are answered. You own the data, you should manage
> the searching. It's the only part of the user experience that
> companies don't even try to manage. It's just outsourced to search
> providers which have little interest in indexing one site better than
> the next. External visibility is the most overlooked part of any web
> site.
>
> You manage your database, you manage your web server farm, you manage
> the content production... You manage everything...except your
> external visibility. It's time to manage that too. Without it, your
> site is just another black hole.
>
> search engines can't...
>
> To current-generation search engines dynamic content (anything with a
> question mark in the URL) is invisible. Your favorite online store,
> your favorite online news source: invisible.
>
> Search engines appear to take about four weeks to crawl a URL you
> submit. Suppose war breaks out. Even if there is an obscure news site
> out there that a search engine can index, you won't find anything
> about that war for a month.
>
> Search engines index the words on a page, not their meaning. So when
> you search for "canon eos-3" on a search engine, you get a bunch of
> hits about the camera. Suppose Epinions.com could answer. They could
> tell you how much Epinions users liked it. Suppose Nikon could
> answer. They could tell you about their F-100 camera at the same
> price point.
>
> Remember: it's not because the search engines don't want to do all
> that. It's because they can't.
>
> technology
>
> Search engines work by "crawling". This technology has been perfected
> over the past six years. A search engine starts at some URL (or some
> set of URLs) and basically clicks on every link on every page that it
> can click on.
>
> The crawler stores every page it sees on a huge disk. The data is
> then indexed. In short, web search engines try to download the entire
> web and make sense of it. A lot has gone into the process to make it
> efficient and yield the most fruit, but there are inherent
> shortcomings.
>
> Part of the reason is that no matter how much technology is thrown at
> the effort to crawl the web, crawling is just too slow to keep up
> with the web's pace of change.
>
> More than that, the reason crawling is outmoded is because modern
> content providers use a growing amount of dynamic content. So, HTML
> pages with forms and URLs with question marks aren't crawled.
> Unfortunately sites' crown jewels are increasingly stored in their
> databases, hidden behind those strange URLs that crawlers are afraid
> to visit.
>
> InfraSearch works
>
> InfraSearch fixes all that.
>
> InfraSearch uses Gnutella distributed information search technology
> to distribute searches to information sources. InfraSearch Agents
> running at information providers provide an interface between
> InfraSearch.com and information providers' databases, flat files,
> HTML pages, or whatever. Information providers can make use of
> whatever data they want to answer queries however they want.
>
> Click to learn a little about InfraSearch's architecture.
>
> ------------------------------------------------------------------------
>
> InfraSearch asks those who know and lets them answer as they want
> ------------------------------------------------------------------------
>
> If an information provider can answer your question, it will answer
> your question in its own special way.
>
> Search for "Mercedes-Benz E55" and maybe you'll get a result from BMW
> telling you about their new M5.
>
> Search for "MSFT". If a broker answers, it might look something like
> this:
>
> MSFT 70 +0.25 Quote is delayed at least 15 minutes.
>
> ------------------------------------------------------------------------
>
> dynamic URLs in hits?!
> ------------------------------------------------------------------------
>
> Another powerful thing InfraSearch can do is allow content providers
> to answer searches with fully dynamic URLs. Search for "mustang drag
> race" and get a customized link from Summit Racing to a page listing
> all the parts to make your 5.0 a 9.0 second car.
>
> ------------------------------------------------------------------------
>
> no dead links!
> ------------------------------------------------------------------------
>
> And...since answers are based on current data: no dead links!
>
> The information you want, up-to-date, and presented in a way which
> makes it easy to find exactly the information you were searching for.
>
> ------------------------------------------------------------------------
>
> prototype
> ------------------------------------------------------------------------
>
> InfraSearch is a prototype at this stage. You know what that means.
>
> It's the idea that counts. Real-time distributed search is the next
> thing, and InfraSearch is the first demonstration that it can do
> useful things.
>
> ------------------------------------------------------------------------
>
> Home | About | Team
> Architecture
>
> COPYRIGHT ) 2000 XCF Ventures. ALL RIGHTS RESERVED.
>
> All trademarks are the property of their respective owners.

-- 
L.U.C.A.S.: Lifeform Used for Calculation and Accurate Sabotage


Date view Thread view Subject view Author view

This archive was generated by hypermail 2b29 : Tue Jun 27 2000 - 17:56:48 PDT