[FoRK] mappings

Jeff Bone jbone at place.org
Fri Oct 23 13:15:35 PDT 2009

Further clarification for Stephen...

> With a rich version of neural nets, I think you start getting close  
> to the kind of structure that is equivalent to the results of  
> automatic training of Markov / Bayesian networks

So to some extent we're talking past each other, largely because the  
terminological rigor in the field in general has become so sloppy that  
anything vaguely connectionist-looking is called a "neural network."   
So fair enough on that point.

Back to a couple of your specific examples:  there are even  
substantial differences between ANNs and e.g. Bayesian networks.  The  
former, as mentioned, can *only* learn a classification n-line and,  
unless recurrent, can learn "moment in time" snapshot-patterns but not  
patterns about things that develop over time.  (Don't get hung up  
about the recurrency issue;  even recurrent ANNs with the typical  
model and learning algorithms aren't that powerful.  I'm not talking  
about the XOR problem, and no, Minsky and Papert weren't "discredited"  
in this;  there analysis was sound, it was just over-interpreted by  
them and everyone else.)

What an ANN  encodes, after the learning phase (in the case of  
supervised learning, the most common case for the real-world use of  
these things) is really nothing more than a classifying function:  how  
to fit an n-line to divide the world and make an either-or decision  
about the inputs with respect to each output.  *At best* you could say  
that the semantic value of the weightings learned encodes some opaque  
and formal model of the target space with respect to the learned input  
examples.  It's *at best* correlative.

The Bayesian networks do something slightly different.  They are  
*explicitly* probabilistically causal with respect to the conditional  
dependencies between their entity and *explicitly* probabilistic with  
respect what they're learning;  what they're learning is, in effect,  
an n-dimensional probability density function relating inputs to  
outputs.  This is *far* more general than e.g. the traditional ANN  
(regardless of the ANN's topology.)  If you like, you can give it a  
similar geometric interpretation:  let's say the data is described by  
3 features / dimensions;  if so, then the ANN learns to fit a plane  
between examples in 3-space to classify the data.  *Roughly*  
analogously, what the Bayesian network learns to do is more abstract  
and powerful:  it learns to build a kind of "fuzzy field" which can be  
used to split the feature space --- and this can be said with  
certainty to be a more robust "model" of the observations than e.g.  
the model encoded in an ANN's networks.  More powerful math, .: more  
powerful model.

Your further example in your equivalenc^H^H^H^H "similarity" class is  
Markov models.  Markov models can be understood as a weaker variant  
encoding of the kinds of conditional dependencies that you see in a  
full Bayesian network.  There's *actually* a pretty good  
correspondence between the two, though the Bayesian networks and  
learning algorithms over them are more abstract.  (Consider whether a  
Bayesian network can "learn" a Markov model.  Then consider the  
converse.  Consider in the context of generalization over unseen data.)

That said, it's all about what you're actually attempting to do.   
Surprisingly many real-world phenomenon can be well-understood (at  
least to the level of decent prediction) without even directly  
modeling any sort of conditional dependency, logical entailment, etc.

HTMs are just a biologically-inspired, turbocharged BN at some level.   
They employ an online-learning algorithm, some more complex and  
layered topology, and --- critically --- an a priori semantics imposed  
implicitly by the learning method, one which considers spatial and  
temporal aspects of its inputs per se.  That's a reasonable thing to  
do from a biological metaphor perspective:  it's like differentiating  
inputs and regions of the network based on which sense is providing  
the data, which of course the human neocortex and support systems  
*do.*  But that's a far cry from what you see in the usual ANN or BN,  
so I would say that an HTM is a *highly advanced* form of BN, almost  
to the point of no longer really being a BN.  (You certainly could use  
an HTM where a BN would work, but why would you want to?  But there  
are many things for which an HTM may be suited that would be entirely  
unsuitable applications for a BN.)

Small nit:

> PGMs can embody semantics, including formal, mathematical systems,  
> but with full probability partial knowledge implication.

If you've got semantics, then your system isn't "formal."  
Definitionally.  ;-)

(But yes, I understand what you're attempting to say here.)


More information about the FoRK mailing list