jbone at place.org
Fri Oct 23 13:15:35 PDT 2009
Further clarification for Stephen...
> With a rich version of neural nets, I think you start getting close
> to the kind of structure that is equivalent to the results of
> automatic training of Markov / Bayesian networks
So to some extent we're talking past each other, largely because the
terminological rigor in the field in general has become so sloppy that
anything vaguely connectionist-looking is called a "neural network."
So fair enough on that point.
Back to a couple of your specific examples: there are even
substantial differences between ANNs and e.g. Bayesian networks. The
former, as mentioned, can *only* learn a classification n-line and,
unless recurrent, can learn "moment in time" snapshot-patterns but not
patterns about things that develop over time. (Don't get hung up
about the recurrency issue; even recurrent ANNs with the typical
model and learning algorithms aren't that powerful. I'm not talking
about the XOR problem, and no, Minsky and Papert weren't "discredited"
in this; there analysis was sound, it was just over-interpreted by
them and everyone else.)
What an ANN encodes, after the learning phase (in the case of
supervised learning, the most common case for the real-world use of
these things) is really nothing more than a classifying function: how
to fit an n-line to divide the world and make an either-or decision
about the inputs with respect to each output. *At best* you could say
that the semantic value of the weightings learned encodes some opaque
and formal model of the target space with respect to the learned input
examples. It's *at best* correlative.
The Bayesian networks do something slightly different. They are
*explicitly* probabilistically causal with respect to the conditional
dependencies between their entity and *explicitly* probabilistic with
respect what they're learning; what they're learning is, in effect,
an n-dimensional probability density function relating inputs to
outputs. This is *far* more general than e.g. the traditional ANN
(regardless of the ANN's topology.) If you like, you can give it a
similar geometric interpretation: let's say the data is described by
3 features / dimensions; if so, then the ANN learns to fit a plane
between examples in 3-space to classify the data. *Roughly*
analogously, what the Bayesian network learns to do is more abstract
and powerful: it learns to build a kind of "fuzzy field" which can be
used to split the feature space --- and this can be said with
certainty to be a more robust "model" of the observations than e.g.
the model encoded in an ANN's networks. More powerful math, .: more
Your further example in your equivalenc^H^H^H^H "similarity" class is
Markov models. Markov models can be understood as a weaker variant
encoding of the kinds of conditional dependencies that you see in a
full Bayesian network. There's *actually* a pretty good
correspondence between the two, though the Bayesian networks and
learning algorithms over them are more abstract. (Consider whether a
Bayesian network can "learn" a Markov model. Then consider the
converse. Consider in the context of generalization over unseen data.)
That said, it's all about what you're actually attempting to do.
Surprisingly many real-world phenomenon can be well-understood (at
least to the level of decent prediction) without even directly
modeling any sort of conditional dependency, logical entailment, etc.
HTMs are just a biologically-inspired, turbocharged BN at some level.
They employ an online-learning algorithm, some more complex and
layered topology, and --- critically --- an a priori semantics imposed
implicitly by the learning method, one which considers spatial and
temporal aspects of its inputs per se. That's a reasonable thing to
do from a biological metaphor perspective: it's like differentiating
inputs and regions of the network based on which sense is providing
the data, which of course the human neocortex and support systems
*do.* But that's a far cry from what you see in the usual ANN or BN,
so I would say that an HTM is a *highly advanced* form of BN, almost
to the point of no longer really being a BN. (You certainly could use
an HTM where a BN would work, but why would you want to? But there
are many things for which an HTM may be suited that would be entirely
unsuitable applications for a BN.)
> PGMs can embody semantics, including formal, mathematical systems,
> but with full probability partial knowledge implication.
If you've got semantics, then your system isn't "formal."
(But yes, I understand what you're attempting to say here.)
More information about the FoRK