[FoRK] Q re: ConceptNet (also FluidDB)
Stephen D. Williams
sdw at lig.net
Fri Oct 23 15:48:23 PDT 2009
Damien Morton wrote:
> The only way a triple store can work is to use the triple-store as a
> conceptual model while building out indices and denormalisations based on
> the queries applied to the store.
> There was an in-memory C++ data store from a dude in russia that did
> something like that - the queries needed to be all provided at compile time
> and a data-store suitable for those queries was created as part of the
> compilation process.
That's part of my design for a solution, except at run time, not compile
time. That is a huge part of the solution. It works as if it is an
idealized triple store, but can choose between multiple internal
representations depending on optimization. Along with a binary RDF
interchange format, inspired by my W3C EXI participation but with many
additional ideas. This is the kind of thing that would be good to do as
an open source project. Too hard to get something commercial to revenue
producing stage, but needed for many things. The hard part seems more
at the query / inference level, not optimizing storage. Perhaps just
because there is more complexity and I haven't tried to solve it yet.
Note though that you can get pretty far with a straight triple store by
observing a few things. First, the actual triples only need to be sets
of 3 integers. All strings are in a separate string table / inverted
index / regexp scanable store. They are stripped/mapped on input and
restored on output, which could even be done on a separate tier / cloud
or at the client. If you use variable integer format, that means 6-9
bytes per triple typically in main memory. The single "table" of
triples then gets indexed 6 times for each pair ordering of the triple
elements pointing to the third. This is the expensive part of writes,
especially if you also do any clustering and stats maintenance.
After that base, the fun starts. I'm a year or two out of date of
current techniques, however it seemed like few projects were getting
very creative on the scalability side. There are some nice commercial
products, although most of what I see is a lot of advancement on the
query / reasoning side.
> On Sat, Oct 24, 2009 at 8:46 AM, J. Andrew Rogers <
> andrew at ceruleansystems.com> wrote:
>> On Oct 23, 2009, at 2:13 PM, Stephen D. Williams wrote:
>>> Scalability is an issue. On the other hand, most scalability issues have
>>> a solution. Certainly simple, flat triple stores aren't going to do it. I
>>> introduced chunkiness to one of my designs (it had temporal versioning.)
>>> Other ideas include certain kinds of clustering, denormalization-like
>>> constructs, etc.
>>> How would you characterize the scalability problems that you have seen?
>>> What fundamental issue was involved?
>> To be clear, I've never used the various triple stores (of which there are
>> myriad designs) out there. I do work with people for whom it is an important
>> The fundamental issue is dynamic analytic performance at non-trivial
>> scales. One could say something similar about all databases, for very
>> similar technical reasons, but the limitations manifest much earlier in
>> graph databases.
>> FoRK mailing list
> FoRK mailing list
More information about the FoRK