[FoRK] Big changes (personal) ahead: soliciting help from the FoRK braintrust

Stephen D. Williams sdw at lig.net
Wed Mar 25 15:41:50 PDT 2015


So you did use Titan?

You can map anything into SQL, just like you can map anything into a filesystem, but the model is annoying for various reasons.

On 3/25/15 10:09 AM, Ken Meltsner wrote:
> For me, when I evaluated Titan Aurelius, the big draw was the ability
> to deal with relative complex DAGs, but to be honest, if you can get
> around the relatively cumbersome SQL, you can do the same thing with a
> stock RDBMS.  [It's not that it's impossible to implement hierarchical
> and DAG models in SQL, it's just that previous products did it badly
> IMHO.]
>
> I also liked the property graph approach, but if you don't need a
> flexible schema, that's not an advantage.  For us, putting together
> service models from various hardware and software configuration items
> was simplified with property graphs -- the two most common approaches
> is to have a separate table for each object type (or a common table
> and an extra property table per class), or to use a vertical model
> similar to triples.  The former is annoyingly inflexible, the latter
> seems to run into performance issues.  I suppose we could have used
> one table shared for all of the objects and a vertical table for the
> extra properties, but that means some properties become second class
> citizens for purposes of queries and such.

I think of that as the inventory problem.  I often point it out when talking about the failings of RDBMS systems.  The funny thing 
is that SQL, especially the query side, doesn't really require such a simplistic data representation system.  Each row could have 
whatever columns are needed, indexing accordingly.  Another solution is to have a blob that is XML or JSON or similar that holds the 
attributes.  Create an index based on attribute name + value, which is more or less a full-text kind of solution.  You can get 
pretty far it seems with a "document" or object (S3 like, not old Objectivity serialization style) with the right indexing.

Graph databases are potentially worse case since they deal with everything as atoms of information, so what would be a large row or 
document is many separate items.  But good ones store everything very compressed, in column databases or whatever, so a lot more can 
happen in memory potentially.  I've toyed around with designs that try to do both: Blobs of graphs, including delta graphs, that are 
binary and directly traversable.  Long ago, I designed something like Spark using versioned graphs in blobs, i.e. an object 
database.  Really an object / metadata database.  Not sure that can be made as general as a generalized graph database but it seems 
like it would scale much better.  Hadoop et al seems to verify that.

>
> We didn't use or need Faunus, the analytics add-on to Titan, so i
> can't say whether that would have changed our decision.
>
> Ken Meltsner

sdw

>
>
> On Wed, Mar 25, 2015 at 11:54 AM, Lucas Gonze <lucas.gonze at gmail.com> wrote:
>> On Mon, Mar 23, 2015 at 1:22 PM, Stephen D. Williams <sdw at lig.net> wrote:
>>
>>> On 3/23/15 9:40 AM, Lucas Gonze wrote:
>>>
>>>> My current team just unhooked Neo4J. It seemed like a good idea but in
>>>> practice added more complexity than it removes.
>>>>
>>> What did you switch to?
>>>
>>
>> Mongo and Postgres.
>>
>>
>>
>>> Did your problem need a graph?
>>>
>>
>> The product requires spidering all the various sites where a band has a
>> presence and linking the data from those sites. We link up their content
>> from Soundcloud, Twitter, FB, YouTube, as well as image vendors like Getty
>> and AP. It's hard not to see this as a graph problem.
>>
>> But we in practice we really weren't using graph algorithms.
>>
>> Maybe that will change once I have been here longer and get around to
>> architecture of the crawler and knowledge graph.
>>
>>
>>
>>> Often, an app only needs to do graph processing in memory.  Object or
>>> document databases, which include RDBMS's that can handle blobs well, can
>>> be a better alternative.  Or Hadoop / Spark for batch processing.
>>>
>>
>> Yup. That's us to a T.
>> _______________________________________________
>> FoRK mailing list
>> http://xent.com/mailman/listinfo/fork
>
>


-- 
Stephen D. Williams sdw at lig.net stephendwilliams at gmail.com LinkedIn: http://sdw.st/in
V:650-450-UNIX (8649) V:866.SDW.UNIX V:703.371.9362 F:703.995.0407
AIM:sdw Skype:StephenDWilliams Yahoo:sdwlignet Resume: http://sdw.st/gres
Personal: http://sdw.st facebook.com/sdwlig twitter.com/scienteer



More information about the FoRK mailing list