After leaving IBM Research where he worked on the SyNAPSE neuromorphic silicon chip, Greg Corrado became one of the founding members and the Co-technical Lead of Google’s Large Scale Deep Neural Networks Project. He is now Senior Research Scientist at Google and in this talk gives some insight into the project. Corrado has also undertaken extensive research in artificial intelligence, computational neuroscience, and scalable machine learning and has worked on brain inspired computing at Google for some time.
Google’s current work with Deep Learning started in 2011 as a fun project in which Corrado was involved and whilst it was not Google’s first foray into Deep Learning, it did produce the first success and nowadays has grown to a team over more than forty world class Engineers including Geoff Hinton.
Corrado starts out by explaining broadly how their system, called DistBelief, works with a proposed neural network partitioned into chunks that needs to be trained, “we partition so that we can map, say four chunks onto four separate machines and those machines have maybe sixteen CPUs, then that collection of machines collaborates in the process of doing the training automatically.” For the example given this means sixty four CPUs would be working on the same network.
Normally quite impressive but Corrado says, “that is still not parallelised enough for the kind of thing that we want to do so the way that we add another layer of parallelism is that we take this four partition version of the model and we make multiple replicas of it so we now have three copies of the same neural network and they’re training on slightly different amounts of data.”
The Slideshare below is from another Google Engineer, Quoc Le but gives added context on DistBelief to compliment the slides which Corrado uses in his presentation.
In order to synchronise what the ‘chunks’ are learning from different pieces of data across the network DistBelief uses a separate set of machines, “the communication that allows this synchronisation is actually really simple where what happens is each version of the neural network downloads the current version of the parameters (state) of the machine learning system and it looks at it’s training data and computes ho it thinks it should change and sends that back to the parameter server”.
Corrado says that in 2012, using this approach, they used over 10,000 CPUs to train a single neural network saying that their infrastructure has allowed them to push more data through a single neural network than anyone ever has.
Having the tools to create such large networks does not necessarily mean it needs to be used in every case and Corrado also talks about when to recommend to C Level the appropriate time to use this type of network, “..as a researcher when do I recommend Deep Learning to someone at C Level? The first question I ask myself is ‘is this a machine perception app?’, is it hearing or vision or photo recognition? If it’s anything like that then we should try Deep Learning right away.. the next question you ask yourself is ‘do you have a mountain of data, particularly labelled data so we can apply the supervised approach that is working so well right now and if you do my answer still isn’t to try Deep Learning right away. Actually you should probably benchmark a Machine Learning system that’s based on a simpler more well understood technology like logistic regression and then try Deep Learning next quarter.”
In summary, the talk supports the notion that Deep Neural Networks can make machines that learn to become smart, either from examples, patterns or outcomes and require extremely large datasets but also have worked well in sparse data applications.