[FoRK] Multicore, async segmented sequential models

J. Andrew Rogers andrew at jarbox.org
Fri May 10 15:58:38 PDT 2013

On May 10, 2013, at 3:11 PM, "Stephen D. Williams" <sdw at lig.net> wrote:
> You started out sounding like you disagreed with my main premise and then concluded in the same general direction that SQL is broken, even if various frankenDBMS products have enough bolted on to solve a wider range of problems.

SQL is just a query language. It is largely divorced from database engine implementation.

> The way that triplestores work is interesting, and I can see a lot of takeoffs to improve chunkiness (a la document databases), etc.  Each constant is in the constants table and every triple/quad consists of 3 IDs.  Then these are indexed for every possible       combination: SPO, POS, OPS, OSP, etc.  

The problem is that this organization does not scale. No secondary indexes and no edge cutting allowed. For ad hoc queries, the joins will start to destroy usable performance before your data set fills the RAM of an average laptop. And the write performance will be what you would expect for something with six secondary indexes.

> You can do both full-text searches of the constant space and relational / graph search in any direction for any relationship chain.  More indexing than a typical RDBMS, and granularity is too high generally, but totally flexible and, for some tasks, actually seems efficient.  Since it can be highly compressed and/or minimized, in memory for instance, there are some interesting cases.

This only offers advantages under a narrow set of constraints:

- Data set is tiny
- Static data set
- Queries are complex ad hoc graph analysis

There is prodigious literature covering all of these types of designs and countless systems have been built on these principles. 

> Another way of putting it is that you need a model that allows configuration, manually or automatically, so that you can vary between maximum flexibility and power vs. various types of performance optimization.  It doesn't seem likely to get everything at once, but it does seem possible to have everything available with little or no code changes, simply requiring a reconfiguration of data / index / memory.  

"simply requiring a reconfiguration of data / index / memory"

Yeah, well that is the real trick now, isn't it. In real systems, no one is willing to pay the extraordinary cost of that operation. In a distributed system it would be straight pathological. 

> The key question is whether this is best done by a multibase, with various modes bolted on,

No, for well-understood computer science reasons. 

> or some unifying but tunably flexible solution.

Yes, this is how it has to be done. So why has no one built such a solution?

More information about the FoRK mailing list