[FoRK] mappings

Stephen Williams sdw at lig.net
Thu Oct 22 20:40:10 PDT 2009


Jeff Bone wrote:
>
> re:  ConceptNet etc...
>
> Stephen quotes me and says:
>
>>> Neural networks and other similar systems are something else entirely,
>>> though, and while there's a mapping here it's a bit elusive.
>>> Spreading activation in semantic networks with fuzzy, defeasible
>>> semantics seems like a pretty rich topic at present.
>>
>> I'm glad you now see a mapping / equivalence.
>
> Slow your roll, there, Stephen.  I did *not* say there was an 
> equivalence and a mapping is not an equivalence.  Neural networks (the 
> perceptron / single-layer / ANN kind) do one thing and one thing 
> only:  they statistically learn a discrimination surface in n-feature 
> space.  That's all they do and it's all they *can* do, and there are 
> hard computational limits on what that enables.  These limits were 
> conclusively demonstrated by Minsky and Papert et. al. in the late 60s 
> (cf. their book Perceptrons) though they both overstated the 
> implications of these limits *and* were largely the genesis, through 
> other people misinterpreting their work, for the almost-complete 
> disappearance of ANN and connectionist-type models from AI and 
> computer science research for a decade and a half.  (Which is a damn 
> shame, of course.)

Minsky and Papert were essentially wrong.  Sure, they proved all kinds 
of things about a two layer network.  So what?  That is like saying you 
can't build a spaceship with one valve.  What they did say was that you 
couldn't build XOR with 2 layers.  How did it not occur to them to add a 
third layer and additional information flow?

> More complex neural network wiring schemes have some different 
> properties than e.g. the pure perceptron, but it's still the case that 
> what they do is build classification or discrimination surfaces or 
> predict values according to either a learned linear relationship or 
> (in the case of recurrent networks) some essentially tail-recursive 
> algorithm.
> What's going on w/ a neural network is *not* semantics;  it's formal, 
> it's math, and it's not even particularly topological in itself.  
> Hence the "something else entirely, though."  The "while there's a 
> mapping here it's a bit elusive" comment regards the stuff that's 
> being done on the frontier of connectionist research, *not* the 
> traditional ANN but rather its extrapolations in things like 
> hierarchical temporal memories and Geoffrey Hinton's work.  I'm *not* 
> stipulating any change of position regarding any earlier debate we 
> had;  your insistence that any of these things have any particular 
> *equivalence* is about as useful as 

OK.  Me either.  ;-)  I think I made it clear before that I wasn't 
referring to a particular classical idea of a neural network.  I think 
more in designs that could be called "hierarchical temporal memories", 
etc.  Thinking about the likely way that, in real brains, neurons 
actually change their structure, connectivity, weighting, and have a 
very large fanout.  That there are competing hypothesis testing and 
selection going on.  Etc.  With a rich version of neural nets, I think 
you start getting close to the kind of structure that is equivalent to 
the results of automatic training of Markov / Bayesian networks.  
However, even with a current hidden-layer feedback network, there is 
some similarity.  I didn't mean that they could be mapped completely.  
You can't even do that completely between Bayesian and Markov 
probability graphs and they are pretty similar.

There are a number of ways to use Bayesian and Markov network ideas to 
do reasoning, precise and not, and machine learning to train them.  
Probabilistic graph models, probabilistic logic networks, probabilistic 
relational reasoning...  I'll just call the whole group of concepts / 
algorithms: Probabilistic Graph Models (PGM) to be generic.  It's what 
Koller used.

"Semantics" in a PGM are just probability distributions of possible 
values of unknown variables along with a reasoning algorithm that prunes 
the work space as it goes, computing remaining probabilities in any 
direction given any known variable values.  If you have a PGM reasoner 
output the most probable value of an unknown variable given a certain 
input, the use is a lot like a NN and the "semantics" could be similar.

Maybe the semantics are in training?  The simplest (but so inefficient 
that it is not done) way to train a PGM is to just to assume to start 
that every variable is dependent on every other variable.  This is a 
full-mesh network to start.  You then process a training set, 
determining for each possible dependency whether there is a valid 
probability relationship or not.  For those that aren't valid over the 
entire training set, you prune the connection and simplify the model.  
For a node that unifies a probability effect, you trim direct links and 
rely on that node.  If you have specified intermediate variables 
required, that should suffice.  If not, then you need something like NN 
hidden layers which either have to be manually specified or auto-discovered.

My point was that starting with a raw NN and training it and starting 
with the equivalent of a full-mesh PGM and training it look the same on 
the outside.  On the inside, current methods for both seem like they are 
searching for the same goal: high fidelity probabilistic answers to 
unknowns given partial knowledge.  The strategies are completely 
different, and the degree of structure is different, however it is a 
similar search.

PGMs can embody semantics, including formal, mathematical systems, but 
with full probability partial knowledge implication.  NNs have less form 
and some interesting characteristics.  PGMs already ate expert systems, 
logic reasoning, and fuzzy logic.  I think the distinctiveness of NNs 
should be added to the PGM collective.  And then add in genetic 
programming (GP not GA).

If you start with a hand-picked Bayesian/Markov graph of optimal 
variables, it doesn't seem anything like a NN.  Instead, if you start 
with a set of variables, assume the equivalent of full-mesh to start, 
then train by something that approximates computing all possible 
probability relationships and then trims all dependencies that don't 
have any strength, it seems a lot closer.

Even somewhat dumb hybrids could be interesting.  For instance, assume 
that you have a good PGM training method, but there is no obvious way to 
find what internal variable nodes you should have.  It might be faster 
to find an algorithm to determine key hidden relationships by training a 
NN and then analyzing for activation paths to internal nodes from 
certain variables.  You then create these in the PGM and train that.

> saying that hash tables and lists are "equivalent" because they are 
> both examples of data structures. ;-)

I don't think that was what I was doing.  That was what you thought I 
was doing.  I'm thinking more of the algorithms, use, capability, which 
all can be similar, and then muse that the representation must have some 
less than random equivalence distance.

sdw
>
> Just to clarify.
>
> jb
>
> _______________________________________________
> FoRK mailing list
> http://xent.com/mailman/listinfo/fork



More information about the FoRK mailing list