[FoRK] large scale dataset mailing list/resources?
Reza B'Far
<reza at voicegenesis.com> on
Sat Feb 23 21:50:56 PST 2008
Hi Luis:
I spent the past year and half at Oracle (well we got bought by Oracle)
solving a similar problem... Ken is quite right in that "Most straight RDF
triplestores seem to hit the wall at millions of triples"... however, there
is actually a pretty elegant solution to this that combines distributed
ontology techniques with MapReduce-like technique... folks that know what
these two things are can probably interpolate the solution fairly
obviously...
Another alternative to get to billions of triples is Oracle 11g RDF Store :)
(that's blatent plug)...
First solution, IMHO, is better... the second one is quicker.
-----Original Message-----
From: fork-bounces at xent.com [mailto:fork-bounces at xent.com]On Behalf Of
Luis Villa
Sent: Wednesday, February 20, 2008 12:32 PM
To: Friends of Rohit Khare
Subject: Re: [FoRK] large scale dataset mailing list/resources?
On Wed, Feb 20, 2008 at 3:02 PM, Jeff Bone <jbone at place.org> wrote:
>
> On Feb 20, 2008, at 9:20 AM, Luis Villa wrote:
>
> > Hey, all-
> >
> > A friend is working on a fairly large-scale data project- will
> > probably top out in the neck of 5M records (but potentially 25-50
> > times that if really takes off), each of which is both a lot of text
> > to be analyzed (5-50K words, with link and potentially grammar
> > analysis) and an associated pdf (original source material.) Goal is to
> > do good search and probably eventual statistical analysis for
> > research. (No prizes for guessing what this is if you've been
> > following my blog ;)
>
> This is big?
Big enough to make doing it on a single machine much less responsive
than he'd like. Or to put it another way: his data sets are growing
larger and harder to parse faster than his machines are growing bigger
and faster at parsing, so things are getting more complicated.
> > Currently search is Apache Solr-powered; he's considering moving to an
> > RDF store
>
> Yeah, good luck w/ that! ;-)
Yeah, I didn't want to tell him that flat out, since it isn't really
my project on the technology side, but I'm hoping to nudge him away.
> Random and tangential, but anybody seen this:
>
> http://blog.freebase.com/?p=108
Eeenteresting.
Luis
_______________________________________________
FoRK mailing list
http://xent.com/mailman/listinfo/fork
More information about the FoRK
mailing list