[FoRK] Multicore, async segmented sequential models
sdw at lig.net
Thu May 9 21:45:46 PDT 2013
On 5/9/13 7:27 PM, J. Andrew Rogers wrote:
> On May 9, 2013, at 6:56 PM, Stephen Williams <sdw at lig.net> wrote:
>> SQL has a few good concepts, but pretty much all implementations mimic the major mistakes made in implementing the model, such as fixed format tables.
> Fixed format tables? What does that mean?
First, delete all data definition statements.
Next, fully qualify names of each "column" on "row" insert.
Insert N rows with every row potentially having a different set of column names.
Make up a new column name, insert a row containing it, query on that column.
Allow indexing on any field name, even if not present in all rows. Support dynamic automatic indexing based on queries.
Adding a new field for new rows? Just start inserting rows using that column.
Apparently they couldn't imagine how to do that efficiently enough in the 70's...
But we are no longer IN the 70's.
Want to do exactly the same optimized storage that you could do with fixed format tables? Just do it for the rows that share
the same fields, or hybrid with common fields and overflow / alternate, or whatever.
Don't think this is a real problem? "Build a database for a hypermarket (Super Walmart / Target / Meijer / Fry's) with a row
for each item having searchable representation for each salient attribute. Metrics for a tire, screw, high heeled shoe, book,
steak, shirt, etc. Now do it in one inventory table. And among the 30,000 different types of items, there is a daily turnover
of about 200 new types of items. No, the store cannot hire a DBA." The solutions to this with SQL are bleak: incomplete,
irregular, inefficient, and otherwise ugly. Someone at AIG once told me that, just for insurance, they worked with 5000
different tables, each with unique schema. That doesn't even begin to consider arbitrary nesting and graph information and
related rich queries and semantics.
> SQL has quite a few defects, some of which explains the myriad dialects. There are features added in ugly, inconsistent ways for backward compatibility with some existing database too lame to do things the correct way. Parts of the standard exist primarily to paper over deficiencies in early database implementations and hardware environments.
> My favorite is that parts of the standard make tacit assumptions about the underlying data structures and algorithms in the implementation that are not required to implement the functionality. As in, those parts do not make sense or in some cases cannot even be mapped to the equivalent functionality in the context of alternative implementations.
True. The fixed format tables (assumption that all rows have exactly the same fields of the same types, with conversion of data
needed for changes) is one of the worst.
The crazy thing is that nothing in the query model implies a need for this. It is just a tradition of a simplifying limitation
leading to a massive loss of flexibility to ease the work of implementers.
> I've been waiting for a NoSQL project to implement the functionality of SQL but with a cleaner design but they mostly just seem to implement relatively small subsets.
While we shouldn't stop at SparQL, it is a fairly powerful alternative to SQL, especially with arbitrary semantics possible in
the engine SparQL is querying.
More information about the FoRK