RE: Real-time distributed Web search (Gnutella knockoff?)

Date view Thread view Subject view Author view

From: Nicolas Popp (nico@realnames.com)
Date: Tue Jun 27 2000 - 19:09:52 PDT


 
I am not sure I understand your point. All I am saying is the type of
distributed search that lets the destination control the answer to a query
based on inconsistent criteria (the one that the destination decides upon)
is likely to produce low quality results. I am not saying that all search
engines do.

Google is the perfect example of a centralized search engine. They use
pagerank to assess the popularity of a page, and the only index the text
that they feel is relevant in a page (highlighted zones, hyperling
titles,...). The fact that Google had to rely on an "external" measure such
as connectivity to improve relevance seems to confirm my argument that
letting the destination determine relevance is not the smartest thing to
do...

>It's ironic that someone from RealNames would post such an assertion.

On the countrary. Because sites will not hesitate to hijack queries,
RealNames had to create a human editorial process (we call it adjudication)
to decide whether someone can actually can get a keyword. This is expensive
and believe me, we would rather trust our customers to pick the keywords
that they are really entitled to (instead of categorical terms that are
popular queries). However, they don't! Query frequencies are heavily skewed
to a few million of generic terms, and everyone would like to be listed when
such queries occur.

I like the distributed approach of InfraSearch. Nevertheless, I also think
that this architecture can be too easily abused, hence will not produce good
results in the real-world...

-Nico

-----Original Message-----
From: kragen@pobox.com
To: fork@kragen.dnaco.net
Sent: 00/06/27 18:34
Subject: Re: Real-time distributed Web search (Gnutella knockoff?)

Nicolas Popp writes:
> Somebody pointed out to me that this approach was doomed.
>
> If you give destination sites control of the search query, many will
lie.
> Most commercial sites want to hijack traffic from search engine and
appear
> very high in a results list whether or not their site is actually
relevant
> to the query. That's a real world fact and that's actually why meta
tags
> have failed (and ignored by search engines these days).
>
> In other words, if you give control of the query to the Web sites that
want
> to be found and lose control of the relevance ranking, you will return
> garbage...

You can use the same argument to prove that Usenet will consist only of
garbage, or that mailing lists will consist only of garbage, or that
the contents of your email box will consist only of garbage, or that
domain names will map only to garbage (in the absence of lawsuits).
It's part of the truth, but it's not all of it.

Google seems to have a fairly decent approach to finding relevant
pages: they show you pages containing your term (I don't remember
whether they use meta tags or not, but AltaVista did last time I
checked) and order them by quality, not relevancy.

It's ironic that someone from RealNames would post such an assertion.
:)

-- 
<kragen@pobox.com>       Kragen Sitaker
<http://www.pobox.com/~kragen/>
The Internet stock bubble didn't burst on 1999-11-08.  Hurrah!
<URL:http://www.pobox.com/~kragen/bubble.html>
The power didn't go out on 2000-01-01 either.  :)


Date view Thread view Subject view Author view

This archive was generated by hypermail 2b29 : Tue Jun 27 2000 - 19:13:56 PDT