[FoRK] Machine-Learning Maestro Michael Jordan on the Delusions of Big Data and Other Huge Engineering Efforts

Stephen D. Williams sdw at lig.net
Fri Nov 14 12:19:16 PST 2014

On 11/14/14, 10:22 AM, J. Andrew Rogers wrote:
>> On Nov 14, 2014, at 3:20 AM, Stephen D. Williams <sdw at lig.net> wrote:
>> Since we haven't seem the architecture of Watson et al, and we haven't been able to test it yet, we can't be sure whether it can handle it or not.
> I know the Watson team is aware of this theoretical limitation because they asked me if I could fix it. (I preferred to do something else.)

I would have been very attracted to that kind of problem, although it would be a challenge to compete with the talent pool they 
should be attracting.

> This, by the way, is a great litmus test for machine learning systems and their designers. You ask them if they can express an apparently mundane type of reasoning that they lack the computer science to express. IBM Research passes that test but most computer scientists do not (a pervasive issue in the field of AI).
> If you recall, not three weeks ago you were using the existence of specialized neurons for spatial-like processing to argue for pervasive specialization. At the time, I pointed out this is unsurprising because some kinds of reasoning we take for granted is not expressible without certain kinds of operators that don’t work with a graph-like data representation. Same story here.

Concentrate on one mechanism, make it work well, understand why it works and what its limitations are, consider all the ways it 
could be applied, determine next steps and possible evolutionary paths. Then look around and consider other completely different 
mechanisms.  It used to be that some people wanted a single solution to win out.  Now ensemble families and other wins of certain 
algorithms at certain tasks when they aren't competitive at other tasks should have convinced people that you need a variety of 
methods to try.  A priori proofs of which one is better is a less reliable method than competently trying all of them.  The fact 
that we're finding more specialization in neural systems, while also more closely examining plasticity and flexible general systems, 
should reinforce this.

> The difference is that almost all algorithms and data structures in computer science are built from graph-like representational primitives. A set of data structures and algorithms built from non-integer, non-graph (abstract) primitives is not something you can just lookup or download but they are necessary.

 From a certain point of view, everything looks like a graph, matrix, triples or quads, etc.  Currently, we build everything in the 
logical, structured world and encapsulate the NN and other fuzzy bits.  Eventually, we'll likely package up the hard algorithms as 
reusable elements in a soft architecture.  There's little point in recapitulating quicksort in a neural net or whatever (other than 
fun, proof of something, experience); better to teach those soft systems how to invoke quicksort.

We have a variety of mechanisms for probabilistic and machine learning runtime reasoning, i.e. the running of a program.  The real 
problem is training.  Whatever method can be trained the fastest, cheapest, most accurately, and somehow verified / visualized / 
tested will likely determine what type of runtime systems we have. Bayesian and hidden Markov belief networks can provably make 
exactly correct decisions when trained correctly.  There are full proofs available, as you know.  But training efficiently still 
seemed like an open problem when I had my stint with prof. D. Koller.  Current neural net systems have largely solved the training 

As I've said before, after the Prof. Andrew Ng's Machine Learning course, I had a strong feeling of equivalence between Bayesian / HMM belief networks and neural nets.  I'm not competent to explore a formal proof and I have more pressing things to do than to pursue other equivalence work, but I can feel it.  One interesting path might be to automate creation of an HMM belief network that embodied the knowledge in a trained NN, possibly using genetic programming to find hidden variables.  Then a plate model or some other subdivided and recurrent HMM to represent an RNN (recurrent) would be interesting.  A different path would be to try to automatically pare down the NN to the minimal set that encodes the same knowledge, basically optimizing the NN.  At the limit, this would seem to be mappable to an equivalent HMM.

If I could focus on NNs full time, I'd experiment with different types of connections, different types of memory, and ways of embedding various types of simulators and other building blocks.  Then I'd work on a metaphor hierarchy system for abstraction.  That is an abyss that I'd love to fall into some day.


More information about the FoRK mailing list