[FoRK] large scale dataset mailing list/resources?

Reza B'Far <reza at voicegenesis.com> on Sat Feb 23 21:50:56 PST 2008

Hi Luis:

I spent the past year and half at Oracle (well we got bought by Oracle)
solving a similar problem... Ken is quite right in that "Most straight RDF
triplestores seem to hit the wall at millions of triples"... however, there
is actually a pretty elegant solution to this that combines distributed
ontology techniques with MapReduce-like technique... folks that know what
these two things are can probably interpolate the solution fairly
obviously...

Another alternative to get to billions of triples is Oracle 11g RDF Store :)
(that's blatent plug)...

First solution, IMHO, is better... the second one is quicker.


-----Original Message-----
From: fork-bounces at xent.com [mailto:fork-bounces at xent.com]On Behalf Of
Luis Villa
Sent: Wednesday, February 20, 2008 12:32 PM
To: Friends of Rohit Khare
Subject: Re: [FoRK] large scale dataset mailing list/resources?


On Wed, Feb 20, 2008 at 3:02 PM, Jeff Bone <jbone at place.org> wrote:
>
>  On Feb 20, 2008, at 9:20 AM, Luis Villa wrote:
>
>  > Hey, all-
>  >
>  > A friend is working on a fairly large-scale data project- will
>  > probably top out in the neck of 5M records (but potentially 25-50
>  > times that if really takes off), each of which is both a lot of text
>  > to be analyzed (5-50K words, with link and potentially grammar
>  > analysis) and an associated pdf (original source material.) Goal is to
>  > do good search and probably eventual statistical analysis for
>  > research. (No prizes for guessing what this is if you've been
>  > following my blog ;)
>
>  This is big?

Big enough to make doing it on a single machine much less responsive
than he'd like. Or to put it another way: his data sets are growing
larger and harder to parse faster than his machines are growing bigger
and faster at parsing, so things are getting more complicated.

>  > Currently search is Apache Solr-powered; he's considering moving to an
>  > RDF store
>
>  Yeah, good luck w/ that! ;-)

Yeah, I didn't want to tell him that flat out, since it isn't really
my project on the technology side, but I'm hoping to nudge him away.

>  Random and tangential, but anybody seen this:
>
>    http://blog.freebase.com/?p=108

Eeenteresting.

Luis
_______________________________________________
FoRK mailing list
http://xent.com/mailman/listinfo/fork


More information about the FoRK mailing list