[FoRK] Big changes (personal) ahead: soliciting help from the FoRK braintrust

Stephen D. Williams sdw at lig.net
Wed Mar 25 16:48:19 PDT 2015


On 3/25/15 3:59 PM, Ken Meltsner wrote:
> Yes, I did use Titan for some tests; the lack of a "Titan for Dummies"
> style book probably doomed the attempt, to be honest -- too much of a
> gap between the technology and the engineers who would have had to
> implement the system.
>
> I've seen the other solutions to the "inventory problem."  XML (or
> JSON) can work quite well, especially in the databases that support
> XPath in views.  IIRC both SQL Server and Oracle can do this.  I
> haven't tried an attribute:value list along with full text -- that's
> probably adequate for most applications, though.
>
> I did see one interesting approach, originally for data warehousing,
> but I think it'd apply to the inventory problem as well:
> "hyper-normalized" schemas with one table per attribute/key.  The

I've worked with a hyper-normalized system a lot.  It was a web app that needed to allow users to define fields they cared about, 
for recruiting automation.  That was mapped onto tables of metadata and data in MSSQL (yes, in WindowsNT, IIS, and a custom 
scripting language in C++ sort of like PHP).  Table of table definition column-defining rows, table of a row for each value in each 
row of each table.  Table of permissions of who was allowed CRUD for each table and potentially each virtual row.  Then, while doing 
6 nested queries to piece the rows back together, honoring security, need to page through sets of a potentially large range of 
records.  With or without a cursor, etc.  What a mess.  SQL Server actually served it far better than I would have predicted.  I got 
to know the pros and cons of that model well.  It's a little like doing assembly programming with SQL.  That was my consistently 300 
hour a month contract job that then didn't pay me for almost 3 months of work, which amounted to $88K.  Gained 12 lbs. in 3 months, 
missed most of my children's summer visit, etc.  Worked at home, with all of the joys of remoting into NT to try to get things to 
restart without a power cycle, MS SourceSafe being abused by everyone at the company (oh the irony in that name), stomping on code, 
duplication of old versions everywhere, 5000 email messages in 4 months, etc.

> queries are hairy but can be generated automatically, and apparently
> the typical query planner doesn't choke on them.  One approach is Data
> Vault Modeling, but I found the material on Anchor Modeling more
> comprehensible.  I haven't tried either one for real.  Both support
> models with changing attributes, an important distinction given that
> most relational database models live in the eternal "now;" versioned
> object graphs, of course, not included.
>
> http://www.anchormodeling.com/

Isn't that just a graph database or semantic RDF triplestore with different terms?  Anchors, ties, knotted attributes?

10+ years ago we talked to a guy with a startup in the UK who was doing a graph database.  He had never heard of semantic web or RDF 
but had a nice database, GUI, editor, implementing much of the same conceptual space.  But he was stuck in a corporate node pricing 
structure ($30K/CPU) that no one could swallow for something new and different like that.

>
> What I really hate is when objects are serialized to (effectively)
> opaque blobs, but that's not an issue I should encounter again.

With RDBMSen before recent features, that was a useful option.  In one case, where we needed ultimate security, hence signed (and 
possibly encrypted) documents as blobs, and didn't really need the data to be normalized, it worked great.  I solved DB problems by 
just repeating certain fields as actual columns for indexing.

>
> Ken Meltsner

sdw

>
>
>
> On Wed, Mar 25, 2015 at 5:41 PM, Stephen D. Williams <sdw at lig.net> wrote:
>> So you did use Titan?
>>
>> You can map anything into SQL, just like you can map anything into a
>> filesystem, but the model is annoying for various reasons.
>>
>> On 3/25/15 10:09 AM, Ken Meltsner wrote:
>>> For me, when I evaluated Titan Aurelius, the big draw was the ability
>>> to deal with relative complex DAGs, but to be honest, if you can get
>>> around the relatively cumbersome SQL, you can do the same thing with a
>>> stock RDBMS.  [It's not that it's impossible to implement hierarchical
>>> and DAG models in SQL, it's just that previous products did it badly
>>> IMHO.]
>>>
>>> I also liked the property graph approach, but if you don't need a
>>> flexible schema, that's not an advantage.  For us, putting together
>>> service models from various hardware and software configuration items
>>> was simplified with property graphs -- the two most common approaches
>>> is to have a separate table for each object type (or a common table
>>> and an extra property table per class), or to use a vertical model
>>> similar to triples.  The former is annoyingly inflexible, the latter
>>> seems to run into performance issues.  I suppose we could have used
>>> one table shared for all of the objects and a vertical table for the
>>> extra properties, but that means some properties become second class
>>> citizens for purposes of queries and such.
>>
>> I think of that as the inventory problem.  I often point it out when talking
>> about the failings of RDBMS systems.  The funny thing is that SQL,
>> especially the query side, doesn't really require such a simplistic data
>> representation system.  Each row could have whatever columns are needed,
>> indexing accordingly.  Another solution is to have a blob that is XML or
>> JSON or similar that holds the attributes.  Create an index based on
>> attribute name + value, which is more or less a full-text kind of solution.
>> You can get pretty far it seems with a "document" or object (S3 like, not
>> old Objectivity serialization style) with the right indexing.
>>
>> Graph databases are potentially worse case since they deal with everything
>> as atoms of information, so what would be a large row or document is many
>> separate items.  But good ones store everything very compressed, in column
>> databases or whatever, so a lot more can happen in memory potentially.  I've
>> toyed around with designs that try to do both: Blobs of graphs, including
>> delta graphs, that are binary and directly traversable.  Long ago, I
>> designed something like Spark using versioned graphs in blobs, i.e. an
>> object database.  Really an object / metadata database.  Not sure that can
>> be made as general as a generalized graph database but it seems like it
>> would scale much better.  Hadoop et al seems to verify that.
>>
>>> We didn't use or need Faunus, the analytics add-on to Titan, so i
>>> can't say whether that would have changed our decision.
>>>
>>> Ken Meltsner
>>
>> sdw
>>
>>>
>>> On Wed, Mar 25, 2015 at 11:54 AM, Lucas Gonze <lucas.gonze at gmail.com>
>>> wrote:
>>>> On Mon, Mar 23, 2015 at 1:22 PM, Stephen D. Williams <sdw at lig.net> wrote:
>>>>
>>>>> On 3/23/15 9:40 AM, Lucas Gonze wrote:
>>>>>
>>>>>> My current team just unhooked Neo4J. It seemed like a good idea but in
>>>>>> practice added more complexity than it removes.
>>>>>>
>>>>> What did you switch to?
>>>>>
>>>> Mongo and Postgres.
>>>>
>>>>
>>>>
>>>>> Did your problem need a graph?
>>>>>
>>>> The product requires spidering all the various sites where a band has a
>>>> presence and linking the data from those sites. We link up their content
>>>> from Soundcloud, Twitter, FB, YouTube, as well as image vendors like
>>>> Getty
>>>> and AP. It's hard not to see this as a graph problem.
>>>>
>>>> But we in practice we really weren't using graph algorithms.
>>>>
>>>> Maybe that will change once I have been here longer and get around to
>>>> architecture of the crawler and knowledge graph.
>>>>
>>>>
>>>>
>>>>> Often, an app only needs to do graph processing in memory.  Object or
>>>>> document databases, which include RDBMS's that can handle blobs well,
>>>>> can
>>>>> be a better alternative.  Or Hadoop / Spark for batch processing.
>>>>>
>>>> Yup. That's us to a T.
>>>>


-- 
Stephen D. Williams sdw at lig.net stephendwilliams at gmail.com LinkedIn: http://sdw.st/in
V:650-450-UNIX (8649) V:866.SDW.UNIX V:703.371.9362 F:703.995.0407
AIM:sdw Skype:StephenDWilliams Yahoo:sdwlignet Resume: http://sdw.st/gres
Personal: http://sdw.st facebook.com/sdwlig twitter.com/scienteer



More information about the FoRK mailing list