Deep learning is all about big data and configuring virtual neurons to adapt to a particular dataset. You could say that a deep architecture is like an ‘idiot savant‘ in some domain, but one able to genuinely generalize, just like we do while learning about new stuff.
Let’s take the example of an easy to understand classification problem:
Is the animal in the picture a cat or a dog? In the video you see dots, representing neurons. The data (image) is connected to the bottom, and the decision happens in the top layer (2 neurons, one for a cat and one for a dog decision).
The intermediate neural layers are tweaked to comply with the top and the bottom. This tweaking is related to making some neurons really active, and others more silent. That’s how it happens in the brain, and also how it happens in deep learning. The exact way our brain learns is not yet fully understood, but we know for certain that information is stored in the neural weights (or synapses in the biological neuron). Each neural layer can be seen as transforming the data in a geometric space. A more technical and interesting explanation was posted here by Francois Chollet.
Some years ago, a vision engineer would have started analyzing this classification problem by carefully selecting cat and dog features. The typical ears-nose-eyes combination of a cat would be a strong identifier, he would start to create a software filter detecting those particularities.
With the advent of deep learning (strongly promoted by the Imagenet competition), that feature selection is no longer performed by engineers, but merely by the neurons themselves and techniques based on gradient descent (backpropagation). The first part of the architecture will group together cat and dog features autonomously, based on correct labels (in the case of supervised learning), thus creating the decision tree without manual engineering. The engineering can focus more on the design of artificial brains, this is the business case Robovision is focusing on: making those brains so powerful and generic that they need only little tweaking to solve a new challenge (RVAI: the RoboVision AI engine).
In the video above, the connections between the layers are tuned by the data itself, by comparing the desired output with the actual output. In the figure below (a bit too technical but I just like to mention it) the essence of deep learning is depicted. The non-linearities in the first stacked layers, just as bipolar and ganglion cells stacked after the retina, are part of the mechanism to aggregate features in the visual stream (part above: biology, part below: a deep neural architecture used in our engines).