[FoRK] mappings
Stephen Williams
sdw at lig.net
Thu Oct 22 20:40:10 PDT 2009
Jeff Bone wrote:
>
> re: ConceptNet etc...
>
> Stephen quotes me and says:
>
>>> Neural networks and other similar systems are something else entirely,
>>> though, and while there's a mapping here it's a bit elusive.
>>> Spreading activation in semantic networks with fuzzy, defeasible
>>> semantics seems like a pretty rich topic at present.
>>
>> I'm glad you now see a mapping / equivalence.
>
> Slow your roll, there, Stephen. I did *not* say there was an
> equivalence and a mapping is not an equivalence. Neural networks (the
> perceptron / single-layer / ANN kind) do one thing and one thing
> only: they statistically learn a discrimination surface in n-feature
> space. That's all they do and it's all they *can* do, and there are
> hard computational limits on what that enables. These limits were
> conclusively demonstrated by Minsky and Papert et. al. in the late 60s
> (cf. their book Perceptrons) though they both overstated the
> implications of these limits *and* were largely the genesis, through
> other people misinterpreting their work, for the almost-complete
> disappearance of ANN and connectionist-type models from AI and
> computer science research for a decade and a half. (Which is a damn
> shame, of course.)
Minsky and Papert were essentially wrong. Sure, they proved all kinds
of things about a two layer network. So what? That is like saying you
can't build a spaceship with one valve. What they did say was that you
couldn't build XOR with 2 layers. How did it not occur to them to add a
third layer and additional information flow?
> More complex neural network wiring schemes have some different
> properties than e.g. the pure perceptron, but it's still the case that
> what they do is build classification or discrimination surfaces or
> predict values according to either a learned linear relationship or
> (in the case of recurrent networks) some essentially tail-recursive
> algorithm.
> What's going on w/ a neural network is *not* semantics; it's formal,
> it's math, and it's not even particularly topological in itself.
> Hence the "something else entirely, though." The "while there's a
> mapping here it's a bit elusive" comment regards the stuff that's
> being done on the frontier of connectionist research, *not* the
> traditional ANN but rather its extrapolations in things like
> hierarchical temporal memories and Geoffrey Hinton's work. I'm *not*
> stipulating any change of position regarding any earlier debate we
> had; your insistence that any of these things have any particular
> *equivalence* is about as useful as
OK. Me either. ;-) I think I made it clear before that I wasn't
referring to a particular classical idea of a neural network. I think
more in designs that could be called "hierarchical temporal memories",
etc. Thinking about the likely way that, in real brains, neurons
actually change their structure, connectivity, weighting, and have a
very large fanout. That there are competing hypothesis testing and
selection going on. Etc. With a rich version of neural nets, I think
you start getting close to the kind of structure that is equivalent to
the results of automatic training of Markov / Bayesian networks.
However, even with a current hidden-layer feedback network, there is
some similarity. I didn't mean that they could be mapped completely.
You can't even do that completely between Bayesian and Markov
probability graphs and they are pretty similar.
There are a number of ways to use Bayesian and Markov network ideas to
do reasoning, precise and not, and machine learning to train them.
Probabilistic graph models, probabilistic logic networks, probabilistic
relational reasoning... I'll just call the whole group of concepts /
algorithms: Probabilistic Graph Models (PGM) to be generic. It's what
Koller used.
"Semantics" in a PGM are just probability distributions of possible
values of unknown variables along with a reasoning algorithm that prunes
the work space as it goes, computing remaining probabilities in any
direction given any known variable values. If you have a PGM reasoner
output the most probable value of an unknown variable given a certain
input, the use is a lot like a NN and the "semantics" could be similar.
Maybe the semantics are in training? The simplest (but so inefficient
that it is not done) way to train a PGM is to just to assume to start
that every variable is dependent on every other variable. This is a
full-mesh network to start. You then process a training set,
determining for each possible dependency whether there is a valid
probability relationship or not. For those that aren't valid over the
entire training set, you prune the connection and simplify the model.
For a node that unifies a probability effect, you trim direct links and
rely on that node. If you have specified intermediate variables
required, that should suffice. If not, then you need something like NN
hidden layers which either have to be manually specified or auto-discovered.
My point was that starting with a raw NN and training it and starting
with the equivalent of a full-mesh PGM and training it look the same on
the outside. On the inside, current methods for both seem like they are
searching for the same goal: high fidelity probabilistic answers to
unknowns given partial knowledge. The strategies are completely
different, and the degree of structure is different, however it is a
similar search.
PGMs can embody semantics, including formal, mathematical systems, but
with full probability partial knowledge implication. NNs have less form
and some interesting characteristics. PGMs already ate expert systems,
logic reasoning, and fuzzy logic. I think the distinctiveness of NNs
should be added to the PGM collective. And then add in genetic
programming (GP not GA).
If you start with a hand-picked Bayesian/Markov graph of optimal
variables, it doesn't seem anything like a NN. Instead, if you start
with a set of variables, assume the equivalent of full-mesh to start,
then train by something that approximates computing all possible
probability relationships and then trims all dependencies that don't
have any strength, it seems a lot closer.
Even somewhat dumb hybrids could be interesting. For instance, assume
that you have a good PGM training method, but there is no obvious way to
find what internal variable nodes you should have. It might be faster
to find an algorithm to determine key hidden relationships by training a
NN and then analyzing for activation paths to internal nodes from
certain variables. You then create these in the PGM and train that.
> saying that hash tables and lists are "equivalent" because they are
> both examples of data structures. ;-)
I don't think that was what I was doing. That was what you thought I
was doing. I'm thinking more of the algorithms, use, capability, which
all can be similar, and then muse that the representation must have some
less than random equivalence distance.
sdw
>
> Just to clarify.
>
> jb
>
> _______________________________________________
> FoRK mailing list
> http://xent.com/mailman/listinfo/fork
More information about the FoRK
mailing list