James and Psychohistory (was Rumble!)
jamesr at best.com
Sun Sep 28 18:24:06 PDT 2003
On 9/27/03 10:24 AM, "jbone at place.org" <jbone at place.org> wrote:
> In all seriousness, I do happen to think human societies, markets, and
> overall economies are a lot more deterministic and predictable than
> most folks would allow. I expect that the kind of thing James is
> talking about will indeed eventually be practical, and it will probably
> work in the general manner he has vaguely described. However I'm very
> dubious about the claims that this is being done on any kind of a large
> scale now, if for no other reason than computational difficulty and the
> need for massive parallelism on a scale that just doesn't exist today
> in order to make things tractable.
The current limitations are two-fold. Model resolution possible for a given
time window is a complex function of working RAM, and this takes piles of
it. The second, and easily as important, is having a lot of high-quality
data sets available. Both of these situations are improving rapidly,
especially with respect to having data sources available.
This kind of thing, insofar as I'm concerned at least, is bound purely by
memory latency and bandwidth at runtime and doesn't require more
computational power per se. We often tend to substitute CPU for good
internal representation because we have a lot more CPU to burn than memory,
but CPU substitution frequently scales so badly in this regard that doesn't
get you all that much more for the effort and poor representation starts to
cost you more memory anyway when things get complicated.
Tractability is a gradient. We can make good models now, but they are
fundamentally limited in scope and resolution by the resources available to
them. As computers get bigger (in the Kolmogorov sense) these will continue
to improve in capability, but to a certain extent it is diminishing returns.
A lot of the really useful robust patterns converge on relatively "small"
machines, but you beat everyone else in a market by shaving off a few more
percentage points that lead to a lot of money (hence the race for bigger and
better) and sometimes useful high-order patterns converge on larger machines
that never rise above the noise floor on smaller machines, giving a
tantalizing ghost to chase.
> I know for a fact that brokerages etc. do best-fit rule system
> generation from historical market data, but that's far from the kind of
> thing James is talking about: you can automatically breed optimum
> trade strategies from that kind of thing, but that's not predicting
> long-term cycles. The best-of-breed solutions I'm aware of even try to
> capture social context, usually by data mining and extraction of social
> "features" from historical news flow.
Yes, this is very popular and many companies use this method. And it isn't
a terribly poor method in terms of bang for computational buck for the basic
data modeling case. The problem is that a lot of people are doing it and
roughly the same way, and more importantly, that those algorithms are poor
and/or very expensive for high-order pattern discovery which puts a limit on
I know people with patents on using social feature extraction on historical
news flow for market prediction, but it isn't as useful in practice as it
could be in theory. The limitations are different sides of the same coin.
These kinds of data sources are uneven, unreliable, and routinely subject to
manipulation. In practice, the kinds of modeling algorithms that have been
applied to this data are not really capable of the deep high-order feature
extraction that would allow an algorithm to effectively see through this
kind of noise.
>From my vantage point, the problem was reduced many years ago to being one
of the knowledge representation rather than one of data source selection.
Selecting various kinds of data streams to analyze is a cheap way of dancing
around the underlying problem, and while it may have offered significant
gains when people were getting their data on tape dumps from limited
sources, it matters less in the current data deluge. But with a "data
deluge", the poor representational efficiency of common algorithms extracts
a steep price on tractability that limits the amount of value practically
extractable from all this data. No point flogging that all-but-dead horse.
> The macro-economic cycle
> prediction stuff I'm aware of is painfully inadequate, on the order of
> accuracy of, say, prediction of weather specific weather at a specific
> location this time next year. (I.e., generalities only are practical.)
Weather modeling is comparable. We can only predict slow, coarse patterns a
long way out, and predicting a stock price a year out is only marginally
more tractable than precisely predicting the weather in your neighborhood
exactly one year out. The data stream isn't rich enough, and even if it
was, crunching the numbers on it wouldn't be tractable anyway.
Long-term prediction is so coarse that it is barely useful. It lets you
pick an outlook assumption and judge broad risk, but it doesn't really tell
you how it is going to be reflected in the various sectors of the market and
economy. The latter part still has to be picked up from more short-term
cues to a great extent.
jamesr at best.com
More information about the FoRK