[FoRK] what to do with the PRISM database

J. Andrew Rogers andrew at jarbox.org
Fri Jul 12 09:54:54 PDT 2013

On Jul 12, 2013, at 2:04 AM, Damien Morton <dmorton at bitfurnace.com> wrote:
> On Thu, Jul 11, 2013 at 3:08 PM, J. Andrew Rogers <andrew at jarbox.org> wrote:
>> On a micro level, constructing exquisitely detailed behavior models of
>> individuals and manipulating their decisions. People are aware of neither
>> the extent of the influence of environmental cues on their decisions nor
>> the specific mapping of cues to the decisions they make. While this has
>> always been a point of exploitation, computers are much better at it than
>> people are. Marketers are not exploiting it now because their tool chains
>> are inadequate for the purpose.
> You are talking about targeted advertising with the goal of priming people
> to certain kinds of [political] behaviour and thinking?

Not necessarily targeted advertising, though it could be. More along the lines of targeted environmental cues that will bias your decisions in a material way. There is interesting research on this. Marketers and casinos already exploit it in narrow ways. Steering a sequence of decisions toward a desired outcome can be used for all manner of thing. Politics, profit, blackmail, whatever. Targeting an individual is different than targeting a population.

The challenge is that decisions are highly contextual and transient environmental cues impart significant bias. In order to generally exploit this in an automated fashion you not only need to track the behavior of a person over long periods of time but you also need to track an enormous amount of contextual environmental data surrounding the thousands of decisions you make every day. Then as you track an individual in their environment in real-time, you can estimate the probability of which choice they will select by default in that context and factors that will maximally bias that decision in either direction in that context.

>> Part of the imagination gap is that the vast majority of people in
>> industry are using crude and extremely limited platforms like Hadoop that
>> are incapable of being used for applications like these. It is projecting
>> the inadequacies of their tool chain onto the possibility space.
> if not hadoop, what?

It depends on what you want to do. The trendiness of Hadoop, despite being pretty poorly designed for most analytic purposes, has retarded the development and uptake of competent open source alternatives. Hadoop is the MySQL 4 of big data.

Generally though, so-called 3rd generation distributed computing platforms that can do the fancy analytics I mentioned have the following characteristics:

- Continuous online ingest of data concurrent with ad hoc analytic queries (i.e. no batch mode)
- Full processing and disk storage of real-time data streams at machine-generated data rates (i.e. written in C/C++)
- Minimal data motion, compute always moves to the data in situ (contrary to Hadoop's modus operandi)
- Good support for spatio-temporal types, operators, and data models (no popular big data platform does this)
- Relational and spatial join operators that parallelize (i.e. practical data fusion)

You can see why open source hasn't really picked up on doing this. The barest of minimally viable kernels for this is *at least* 100k LoC of pretty fussy C/C++. There are a couple commercial kernels floating around in closed betas for some time now that can do most or all of this though; it saves the pain of building single-use analytical engines.

More information about the FoRK mailing list