[FoRK] Human behavior is 93% predictable

Jebadiah Moore jebdm at jebdm.net
Tue Mar 2 16:08:10 PST 2010

On Tue, Mar 2, 2010 at 5:36 PM, J. Andrew Rogers <andrew at ceruleansystems.com
> wrote:

> To be clear, I am not saying you can't be unpredictable, just that it is
> much, much harder than most people intuit.

Sure, I definitely agree with you on that.  Plus, being unpredictable enough
to avoid detection/whatever probably limits you too much for it to be
worthwhile in most circumstances.

I think I'm mostly agreeing with you, just putting emphasis on the possible
rather than the probable.  Nonetheless, clarifications about my line of
thinking, which hopefully will reveal that I understand what you're saying
(though possibly may make my blunders more painfully obvious, in which case
I beg the favor of rectification, although it's understandable if the task
seems too tedious):

> > 1) RNG
> >
> > Presumably any particular model isn't going to fully model the universe
> at
> > particle level.  Instead, it's going to approximate human actions at some
> > particular level of atomicity in some set of dimensions--probably
> location,
> > relation of location to person (at home, at work, at friend's, etc.),
> basic
> > state (eating, working, socializing, sex, etc.).
> While the information is discrete, states like you outline above are not.
> Looking at it as a database of time-place-activity logs is incorrect. The
> models work at a different level of abstraction, being pretty purely
> information theoretic.  You can pull surprising patterns out of very diffuse
> bits.

The models may work on the basis of info theory, but like you said the
information itself (the data collection) will be discrete, and at some level
it'll have to be a database of some sort of logs.  And when you build a
model on such information, you're making predictions a universe which works
on the basis of recorded locations and computer logs, rather than particle
interactions or the gods' dice or whatever.  That hypothetical universe is
of course likely to model the things you're interested in fairly accurately,
but there are naturally going to be differences, some of which might be

> > 2) Breaking the model
> The model is high-order and inductive, you can't reason about the model
> this way. To break it requires having a copy of the (constantly updated)
> data that created the model. Attempting to break the model becomes part of
> the model.

I think I understand what you're saying here (the model isn't fixed, it
reacts/learns to new patterns and whatnot), but every model at some level is
flawed; induction can't be perfect.  Compressed sensing won't work well with
truly dense images, music discovery doesn't work well if your tastes are
particularly eclectic or mixed or based on dimensions the service doesn't
observe, and spam filtering is bad at dealing with cases where you want some
of the mail from certain sources on certain topics (you're interested in a
company enough to subscribe to their newsletter, but you're only interested
in a small part of the stuff they send).

In all these cases, the models get the 90% right but can't deal with the
last 10%--which is often the really interesting part.  The model has to have
abstractions somewhere, and the choice of abstraction will determine which
(proverbial, since the number will vary) 10% is hard, whether the
abstractions are chosen by a human or the model itself.  In the case of mass
models of human behavior, the hard 10% is most likely going to be deviant
behavior, which in the case of surveillance is also the interesting 10%.  Of
course, the model can adapt, and maybe it can even do so well enough to beat
the well-informed surveillee.

But the really interesting behavior in humans is also going to be hard to
detect, because it's subtle; the majority of people doing intellectual work,
for instance, probably have wildly similar habits and so on, but they think
about a wide range of things, and many could get away with participating in
subversive activities without any model noticing--that is, unless the model
is able to deal with the subtleties of human language, in which case the AI
race is over and avoiding detection will involve dodging sensors.

That said, perhaps the actual number for the 10% can be reduced to
acceptably low rates.  93% accuracy (which is obviously a longshot) is
dodgeable.  If you could build a 99.999% reliable model, that's still a lot
of errors given scale, but perhaps it would be prohibitively difficult to
actually engineer an error and they would all be random.

As for having access to the data, many of the people (I'm thinking
espionage) who would be truly both interested in and capable of fooling the
algorithm could likely get access.  Hell, it might even be
publicly available (imagine that).  But the average foil-crowner would
probably be up a creek.

Jebadiah Moore

More information about the FoRK mailing list