FoRK and Spam..

Dan Brickley danbri@w3.org
Tue, 19 Mar 2002 18:13:27 -0500 (EST)


On Mon, 18 Mar 2002, Gerald Oskoboiny wrote:

> On Mon, Mar 18, 2002 at 02:27:09PM -0500, Dan Brickley wrote:
> :
> > Coincidentally enough, I finally got around to moving to white-list based
> > filtering this weekend. I've a list of 'known senders' harvested from
> > various places (my sent-mail, addressbooks etc).
> >
> > I started out following Gerald's recipe:
> >
> > 	http://impressive.net/people/gerald/2000/12/spam-filtering.html
> >
> > ...and got to thinking about the possibility of white-list sharing, since
> > my 'unknown senders' folder was initially at least still getting lost of
> > false hits (mostly from people on mailing lists, but also from
> > occasional correspondents who are known in the Webby community, but not
> > in my sent-mail or addressbook).
>
> very good idea imho; should eventually evolve into a big web of trust
> thingy involving thousands of data sources... (and better tools so
> you can say "this message got through but is really spam; decrease
> the level of trust in the data source that told me it's legit"
> with a single keypress)

That would be fun. I think there are a few worries to address before this
takes off, and also that it might just be a stopgap and that PGP-signing
all email may be the only solution soon(ish).

> It is a tiny privacy hole in that it lets others find out that
> you have e.g. premium-subscribers@rdfporn.com on your whitelist.
> (I don't think I care about that, but some might.)

Good point. There are some other concerns I have. I've tried to detail
these in http://www.w3.org/2001/12/rubyrdf/util/foafwhite/intro (rough cut
writeup).

For example: if the FoRK mailing list membership (sha1-mangled) and
several others were made available in RDF/XML, it'd be really easy to run
tools that figure out the interests of list memebers by comparing with the
sha1 dumps of other mailing lists. Getting back from the scrambled
mailboxes to originals (within this limited domain) probably wouldn't be
hard either. So we could make it easier for spammers, recruiters etc to do
detailed profiles of us (and hence make their spam-target lists more
valuable).

> > So I was thinking I'd have a little whitelist harvesting script(*) pull in
> > a few of these each day from friends and colleagues, making it that bit
> > less likely that folk from (mumble) "the web community" would find their
> > messages languishing in my unknown-senders folder.
>
> This might make it cost-effective for me to start maintaining and
> using blacklists as well as whitelists; also needed would be a
> "refilter new mail in this mailbox" script (easily doable.)

That'd be handy. I've not tried blacklists yet. I installed the Perl stuff
(Razor etc) but it had some secret protocol for talkign to their services
instead of just pulling data via HTTP. Blacklist data changes faster than
whitelist, too...

> > How's that sound? Anybody fancy trying this?
>
> I'm all over it! (time/attention permitting; keep bugging me ;)

Bug! :)

Dan


ps. http://www.w3.org/2001/12/rubyrdf/util/foafwhite/intro has links to
three RDF/XML files if people want to play with this stuff

> > (*) rough-cut Ruby code that implements much of this (requires external
> > RDF parser) is at http://www.w3.org/2001/12/rubyrdf/util/foafwhite/foafwhite.rb