[FoRK] Big changes (personal) ahead: soliciting help from the FoRK braintrust
meltsner at alum.mit.edu
Wed Mar 25 15:59:41 PDT 2015
Yes, I did use Titan for some tests; the lack of a "Titan for Dummies"
style book probably doomed the attempt, to be honest -- too much of a
gap between the technology and the engineers who would have had to
implement the system.
I've seen the other solutions to the "inventory problem." XML (or
JSON) can work quite well, especially in the databases that support
XPath in views. IIRC both SQL Server and Oracle can do this. I
haven't tried an attribute:value list along with full text -- that's
probably adequate for most applications, though.
I did see one interesting approach, originally for data warehousing,
but I think it'd apply to the inventory problem as well:
"hyper-normalized" schemas with one table per attribute/key. The
queries are hairy but can be generated automatically, and apparently
the typical query planner doesn't choke on them. One approach is Data
Vault Modeling, but I found the material on Anchor Modeling more
comprehensible. I haven't tried either one for real. Both support
models with changing attributes, an important distinction given that
most relational database models live in the eternal "now;" versioned
object graphs, of course, not included.
What I really hate is when objects are serialized to (effectively)
opaque blobs, but that's not an issue I should encounter again.
On Wed, Mar 25, 2015 at 5:41 PM, Stephen D. Williams <sdw at lig.net> wrote:
> So you did use Titan?
> You can map anything into SQL, just like you can map anything into a
> filesystem, but the model is annoying for various reasons.
> On 3/25/15 10:09 AM, Ken Meltsner wrote:
>> For me, when I evaluated Titan Aurelius, the big draw was the ability
>> to deal with relative complex DAGs, but to be honest, if you can get
>> around the relatively cumbersome SQL, you can do the same thing with a
>> stock RDBMS. [It's not that it's impossible to implement hierarchical
>> and DAG models in SQL, it's just that previous products did it badly
>> I also liked the property graph approach, but if you don't need a
>> flexible schema, that's not an advantage. For us, putting together
>> service models from various hardware and software configuration items
>> was simplified with property graphs -- the two most common approaches
>> is to have a separate table for each object type (or a common table
>> and an extra property table per class), or to use a vertical model
>> similar to triples. The former is annoyingly inflexible, the latter
>> seems to run into performance issues. I suppose we could have used
>> one table shared for all of the objects and a vertical table for the
>> extra properties, but that means some properties become second class
>> citizens for purposes of queries and such.
> I think of that as the inventory problem. I often point it out when talking
> about the failings of RDBMS systems. The funny thing is that SQL,
> especially the query side, doesn't really require such a simplistic data
> representation system. Each row could have whatever columns are needed,
> indexing accordingly. Another solution is to have a blob that is XML or
> JSON or similar that holds the attributes. Create an index based on
> attribute name + value, which is more or less a full-text kind of solution.
> You can get pretty far it seems with a "document" or object (S3 like, not
> old Objectivity serialization style) with the right indexing.
> Graph databases are potentially worse case since they deal with everything
> as atoms of information, so what would be a large row or document is many
> separate items. But good ones store everything very compressed, in column
> databases or whatever, so a lot more can happen in memory potentially. I've
> toyed around with designs that try to do both: Blobs of graphs, including
> delta graphs, that are binary and directly traversable. Long ago, I
> designed something like Spark using versioned graphs in blobs, i.e. an
> object database. Really an object / metadata database. Not sure that can
> be made as general as a generalized graph database but it seems like it
> would scale much better. Hadoop et al seems to verify that.
>> We didn't use or need Faunus, the analytics add-on to Titan, so i
>> can't say whether that would have changed our decision.
>> Ken Meltsner
>> On Wed, Mar 25, 2015 at 11:54 AM, Lucas Gonze <lucas.gonze at gmail.com>
>>> On Mon, Mar 23, 2015 at 1:22 PM, Stephen D. Williams <sdw at lig.net> wrote:
>>>> On 3/23/15 9:40 AM, Lucas Gonze wrote:
>>>>> My current team just unhooked Neo4J. It seemed like a good idea but in
>>>>> practice added more complexity than it removes.
>>>> What did you switch to?
>>> Mongo and Postgres.
>>>> Did your problem need a graph?
>>> The product requires spidering all the various sites where a band has a
>>> presence and linking the data from those sites. We link up their content
>>> from Soundcloud, Twitter, FB, YouTube, as well as image vendors like
>>> and AP. It's hard not to see this as a graph problem.
>>> But we in practice we really weren't using graph algorithms.
>>> Maybe that will change once I have been here longer and get around to
>>> architecture of the crawler and knowledge graph.
>>>> Often, an app only needs to do graph processing in memory. Object or
>>>> document databases, which include RDBMS's that can handle blobs well,
>>>> be a better alternative. Or Hadoop / Spark for batch processing.
>>> Yup. That's us to a T.
>>> FoRK mailing list
> Stephen D. Williams sdw at lig.net stephendwilliams at gmail.com LinkedIn:
> V:650-450-UNIX (8649) V:866.SDW.UNIX V:703.371.9362 F:703.995.0407
> AIM:sdw Skype:StephenDWilliams Yahoo:sdwlignet Resume: http://sdw.st/gres
> Personal: http://sdw.st facebook.com/sdwlig twitter.com/scienteer
> FoRK mailing list
After 30+ years of email, I have used up my supply of clever ,sig material.
More information about the FoRK