[FoRK] Big Data

geege schuman geege4 at gmail.com
Sat Feb 4 11:53:05 PST 2012


Like Tableau?
On Feb 4, 2012 1:51 PM, "J. Andrew Rogers" <andrew at jarbox.org> wrote:

>
> On Feb 4, 2012, at 7:40 AM, Gregory Alan Bolcer wrote:
> >
> > For human data, there's only so many ways you can process it.  Most of
> the times more data only means more cost.  The dark matter is the space in
> between, aka the correlations.
>
>
> Most people are using the wrong tools and wrong data. Data analysis is
> being done on people centric data, like social graphs, because the
> applications are people centric. Using a social graph as the primary key of
> human behavior is defective because (1) there are widespread
> inconsistencies in the key set and (2) a vast number of entities influence
> human behavior that cannot be meaningfully represented in a human centric
> data model.
>
> This argument has started to gain currency. So what should replace it? The
> primary key of reality, space and time. If you can track arbitrary entities
> and features in space-time, then you can infer most other relationships
> that we use in behavioral data models. It also provides the base model into
> which all data sources can be organized, you do not have the impedance
> mismatch you see between data models for, say, satellite imagery and social
> graphs. The beauty of this model is that it can be used to analyze the
> behavior of arbitrary systems, not just human behavior.
>
> Once you have this type of analytical model, you need to be able to
> parallelize joins and transitive closures to make it useful.
>
>
> And therein lies the problem. Most people doing "big data" are using very
> primitive distributed computing technologies like Hadoop, which does
> neither space-time data models nor graph analytics well.
>
>
> > If anyone knows anything about highly correlated human data, it doesn't
> map well to divide and conquer approaches.  Techniques for mapping
> non-d-a-cq-bd are definitely ripe for some IP.
>
>
> I think it would be more accurate to say that it does not map well to
> *naive* divide-and-conquer approaches. You won't get there using simple
> hash or range partitioning. There is already quite a bit of IP around more
> capable techniques but I have not seen any of it in open source or
> literature.
>
>
> --
> J. Andrew Rogers
> Twitter: @jandrewrogers
>
>
>
> _______________________________________________
> FoRK mailing list
> http://xent.com/mailman/listinfo/fork
>


More information about the FoRK mailing list