[FoRK] Q re: ConceptNet (also FluidDB)

Stephen D. Williams sdw at lig.net
Fri Oct 23 15:48:23 PDT 2009


Damien Morton wrote:
> The only way a triple store can work is to use the triple-store as a
> conceptual model while building out indices and denormalisations based on
> the queries applied to the store.
> There was an in-memory C++ data store from a dude in russia that did
> something like that - the queries needed to be all provided at compile time
> and a data-store suitable for those queries was created as part of the
> compilation process.
>   
That's part of my design for a solution, except at run time, not compile 
time.  That is a huge part of the solution.  It works as if it is an 
idealized triple store, but can choose between multiple internal 
representations depending on optimization.  Along with a binary RDF 
interchange format, inspired by my W3C EXI participation but with many 
additional ideas.  This is the kind of thing that would be good to do as 
an open source project.  Too hard to get something commercial to revenue 
producing stage, but needed for many things.  The hard part seems more 
at the query / inference level, not optimizing storage.  Perhaps just 
because there is more complexity and I haven't tried to solve it yet.

Note though that you can get pretty far with a straight triple store by 
observing a few things.  First, the actual triples only need to be sets 
of 3 integers.  All strings are in a separate string table / inverted 
index / regexp scanable store.  They are stripped/mapped on input and 
restored on output, which could even be done on a separate tier / cloud 
or at the client.  If you use variable integer format, that means 6-9 
bytes per triple typically in main memory.  The single "table" of 
triples then gets indexed 6 times for each pair ordering of the triple 
elements pointing to the third.  This is the expensive part of writes, 
especially if you also do any clustering and stats maintenance.

After that base, the fun starts.  I'm a year or two out of date of 
current techniques, however it seemed like few projects were getting 
very creative on the scalability side.  There are some nice commercial 
products, although most of what I see is a lot of advancement on the 
query / reasoning side.

Stephen
> On Sat, Oct 24, 2009 at 8:46 AM, J. Andrew Rogers <
> andrew at ceruleansystems.com> wrote:
>
>   
>> On Oct 23, 2009, at 2:13 PM, Stephen D. Williams wrote:
>>
>>     
>>> Scalability is an issue.  On the other hand, most scalability issues have
>>> a solution.  Certainly simple, flat triple stores aren't going to do it.  I
>>> introduced chunkiness to one of my designs (it had temporal versioning.)
>>>  Other ideas include certain kinds of clustering, denormalization-like
>>> constructs, etc.
>>>
>>> How would you characterize the scalability problems that you have seen?
>>>  What fundamental issue was involved?
>>>
>>>       
>> To be clear, I've never used the various triple stores (of which there are
>> myriad designs) out there. I do work with people for whom it is an important
>> problem.
>>
>> The fundamental issue is dynamic analytic performance at non-trivial
>> scales. One could say something similar about all databases, for very
>> similar technical reasons, but the limitations manifest much earlier in
>> graph databases.
>>
>> _______________________________________________
>> FoRK mailing list
>> http://xent.com/mailman/listinfo/fork
>>
>>     
> _______________________________________________
> FoRK mailing list
> http://xent.com/mailman/listinfo/fork
>   



More information about the FoRK mailing list