What Is Deep Learning? Definition, Techniques, and Use Cases

essidsolutions

Recent artificial intelligence (AI) breakthroughs have come from Deep Learning, a subset of AI which uses artificial neural networks to crunch data and perform tasks such as object detection and speech recognition.

Deep Learning has created a revolution that powers self-driving cars, gives machines the ability to describe the contents of images, and much more. This article explores the origins of Deep Learning, the architectures of deep neural networks, and the current state-of-the-art use cases. It also introduces the frameworks used to create deep neural networks that are accessible to anyone.

Table of Contents:

1 What Is Deep Learning?
1.1 Origins
1.2 Features and Layered Abstractions
1.3 GPUs to the Rescue

2 Deep Learning Techniques
2.1 Neurons to Multilayer Networks
2.2 Training

3 Deep Learning Use Cases
3.1 Image/Object Recognition
3.2 Image/Video Captioning
3.3 Speech Recognition
3.4 Language Translation
3.5 Automatic Text Summarization
3.6 Question and Answer Systems

4. Deep Learning vs. Neural Networks

5. Types of Deep Neural Networks
5.1 Recurrent Neural Networks
5.2 Convolutional Neural Networks
5.3 Long Short-Term Memory Networks

6. Deep Learning Frameworks

7. Advancements in Deep Learning

8. Conclusion

1. What Is Deep Learning?

Deep Learning is a subfield of machine learning (ML) and represents a set of neural network architectures that solves complex, cutting-edge problems. These architectures (or models) go by the names convolutional neural networks (CNNs) and long short-term memory (LSTM), among others.

Deep Learning architectures are called deep because they have many layers. The depth of the layers are important, as you’ll see in this section. Deep Learning refers to neural network architectures that include many layers and have the capability to learn (through training) to map an input, such as an image, to one or more outputs, such as a classification. The classification could represent whether the image contains a cat or does not contain a cat. With companies experimenting with data-intensive problems such as speech-to-text and computer vision, data scientists turned to deep learning to solve business problems that couldn’t be tackled with non-linear algorithms.

1.1 Origins
Deep Learning research began in the 1960s but its benefits have only been realized in the past decade. The first working deep neural network was developed in 1967 and consisted of eight layers. It wasn’t until Yann LeCun developed a deep neural network for recognizing handwritten postal codes in 1989, that the power of this new model became obvious. His model required three days of training with test images using the standard back-propagation algorithm (a popular and commonly used supervised learning algorithm for multilayer neural networks).

Deep Learning research continued to lead to state-of-the-art results in a variety of problem domains, from speech recognition to object detection in images and even natural language processing (NLP). New architectures were constructed to address new and varying problems. By 2000, the impact and benefit of Deep Learning became clear.

1.2 Features and Layered Abstractions
The depth of the deep neural network is important because in the case of convolutional neural networks (CNNs), it provides the basis for abstractions. Figure 1 provides a high-level view of a CNN that is trained to identify whether a picture contains a cat. The network is split into two distinct parts, identified as the set of convolutional layers and the classification layer. We’ll explore these in more detail later, but the convolutional layers filter and detect features.

Figure 1: How Deep Neural Networks Encode Features

Early in the network, these features consist of edges. Later in the network, features are combined into higher-level features (such as ears or eyes). These abstractions of features result in a high-level feature detector that classifies whether the necessary features are present to identify whether a cat is present within the image.

1.3 GPUs to the Rescue
Training Deep Learning networks is a computationally intensive task and has sparked advancements on the hardware end. The availability of compute power is the primary driver behind the adoption of deep learning models. That’s how, Nvidia’s GPUs, dubbed as the workhorse of deep learning rose to prominence. GPUs were initially leveraged for rendering high fidelity graphics but soon edged out CPUs in usage, thanks to its in-built parallelization which helps in performing tasks at scale.

When GPUs were first used in deep learning architectures, they decreased the training time for complex networks from weeks to days (calculating a billion vector operations per pass). GPUs are now the go-to processors for deep learning architectures, and innovation continues to occur to optimize deep learning even further.

Here’s a high level overview of the key difference between GPUs and CPUs — GPUs are highly parallel and include thousands of individual processing units called cores. GPUs differ from CPUs, that typically have no more than four or eight cores. With their many cores, GPUs can simultaneously perform many-vector math operations (the basis for neuron processing), which greatly improves the speed at which neural networks are trained and deployed. When GPUs were first used in Deep Learning architectures, they decreased the training time for complex networks from weeks to days (calculating a billion vector operations per pass).

Learn More: What is Machine Learning?

2. Deep Learning Techniques

Deep Learning represents a set of architectures built on the ideas of neural networks. Neural networks are computational structures—networks of computing elements that can be adjusted through training, and then applied to problems. Let’s step back to the origin of Deep Learning and explore the fundamentals of neural networks.

2.1 Neurons to Multilayer Networks
A neural network is a network of neurons that implement a mathematical function. An input is fed into the network (typically as a vector), the individual neurons calculate their outputs in each layer to the output. This process is called feedforward and represents the execution of the network. Each neuron can be fed from the input or prior layer through a weight that individually adjusts the particular input. The input multiplied by weight is then summed and passed through an activation function to determine the output (Figure 2).

Figure 2: A Single Neuron and Its Mathematical Equivalent

The activation function can be any of a variety of different types (step function, sigmoid, etc.) and is commonly chosen for the type of network and problem. For multilayer networks, the activation function is selected to introduce nonlinearity. Introducing nonlinearity enables multilayer networks to solve relatively complex problems with a small number of neurons.

Figure 3: A Multilayer (3-Layer) Neural Network

The network operates in a simple fashion: the inputs are applied, and then each layer of the network is computed, starting with the input layer. Because the input layer feeds the hidden layer, the hidden layer is computed next. This behavior is called feedforward because computation moves forward and no cycles or loops form in the network.

2.2 Training
The weights of the network are individually defined and provide the basis for mapping the inputs to the outputs. The definition of these weights is training, and it occurs over many iterations to tune the weights to map the input to output with an acceptable level of error.

Multilayer networks commonly use a class of algorithm called back-propagation to adjust the weights of the network. They do this by taking a sample from the training set, applying the input, calculating the output, and then comparing them with the expected outputs (the forward pass). The difference between the output and the expected output is the error. This error is used to adjust the weights of the network, starting with the output layer and moving to the input layer in a backward pass. Over many training samples, this back-propagation of the error to adjust the weights in the network minimizes the error for the overall training set.

The process of applying a training sample to the network, checking the answer, and then adjusting the network accordingly is the basis for supervised learningOpens a new window . It’s supervised because the training set includes the desired behavior of the network.

Learn More: Top 10 Machine Learning Algorithms

3. Deep Learning Use Cases

This section explores some of the state-of-the-art use cases being applied to Deep Learning today.

3.1 Image/Object Recognition
One of the first success stories of Deep Learning was in handwritten number recognition for postal codes, which was applied in the 1990s. The variability of handwriting made this a difficult problem but not one that CNNs could not solve. With this success, deep neural networks were applied to object detection (such as attaching a bounding box to faces within an image) and object recognition (identifying a particular person from their face). In 2012, Google taught a deep neural network to recognize cats in YouTube videos with 70% accuracy, which was greater precision than approaches that had preceded it.

3.2 Image/Video Captioning
When objects could be identified in images and videos, the next logical step was to provide a short summary (or caption) to an image or video. This solution required an ensemble of methods to solve, including a CNN to identify and recognize objects within an image or video, and then an LSTM (recurrent) network to emit a sequence of natural language words that represent the specific input.

This particular use case required large amounts of data with many captioned images (each with multiple captions) used for training. This particular problem was solved with crowdsourcing, enabling the public to provide captions for training purposes.

3.3 Speech Recognition
Recognizing speech has been a holy grail for machine learning since the beginning. Not surprisingly, deep neural networks have advanced this field. Deep neural networks have increased the accuracy of recognition as well as operating directly on speech rather than intermediate representations.

Two key methods applied in automatic speech recognition include CNNs and recurrent neural networks (RNNs), given the sequence-based nature of speech. Connectionist temporal classification has also been used for training both RNNs and LSTM networks, given the variance in the timing of speech.

3.4 Language Translation
The state of the art in machine translation of language lies with deep learning. In 2014, the first scientific paper appeared on language translation with neural networks. Since then, competitions have appeared pitting researchers and their algorithms on translation problems against each other, where the solutions are predominantly Deep Learning based.

Today, a bidirectional RNN that operates on complete sentences powers the Google Translate service. In this approach, two independent RNNs are stacked on top of one another. They have the ability to understand context from the past and future of the input sequence. Solutions of this type are now referred to as neural machine translation.

3.5 Automatic Text Summarization
Text summarization refers to the ability to automatically produce a shorter summary of a longer text example. This function has been performed in two ways: extractive, where key sentences are taken from the original text to summarize, and abstractive, where a new summary is generated from the original text data.

LSTM networks have been used successfully in this area in a decoder/encoder model. In this model, the encoder representing an independent LSTM network accepts the input text data, and the decoder LSTM network builds the summary as an independent sequence. This encoder/decoder architecture is ideal, where the sequence lengths differ (such as the original text and the output summary).

3.6 Question and Answer Systems
Question and answer (Q&A) systems represent an old and well-researched problem, but Deep Learning has advanced the state-of-the-art for this useful capability. Q&A systems simulate human conversation while providing meaningful answers to the questions humans pose.

Recurrent-style deep neural networks have been applied here. Although they require a significant amount of training, these architectures have been successfully applied to this sequence (question) to sequence (answer) problem.

Learn More: What is Natural Language Processing?

4. Deep Learning vs. Neural Networks

A key differentiating feature of deep neural networks is depth: the number of layers in addition to the breadth or number of processing elements within each layer. But, deep neural networks have evolved from the typical multilayer networks that brought us to the point of deep learning. The architectures of deep neural networks are different from their multilayer ancestors. CNNs, which work well with image data, sample and pool pixels from the image for processing. RNNs, which are ideal for sequence data such as text, consider not just an input but the inputs that precede and follow it.

The structure of neurons has also changed as deep learning has evolved. Rather than just summed weighted products passed through an activation function, neurons in newer deep learning networks, such as LSTM networks, include gates to regulate the flow of information and even to forget information.

5. Types of Deep Neural Networks

We’ve discussed some of the types of deep neural networks that are available today as well as the problems to which they’ve been applied. Now, let’s dig further into these architectures to see how they are decomposed in addition to the training methods used for each.

5.1 Recurrent Neural Networks
RNNs come in a variety of architectural styles, but all include the behavior of internal state, which means that they can be applied to problems in the time domain. As Figure 4 shows, the simple feedforward network is amended with two neurons that are fed from the hidden layer (called an Elman network), which then feed back into the hidden layer. This cycle provides a simple form of memory within the neural network that makes them ideal for time-series prediction.

Figure 4: Unrolling a Recurrent Neural Network

Training an RNN requires a variant of back-propagation called back-propagation through time. This is a generalization of the standard back-propagation algorithm typically used in multilayer neural networks.

5.2 Convolutional Neural Networks
CNN is a deep neural network architecture that excels at image classification tasks. It operates through a series of convolution and max-pooling operations over an input (see Figure 5). The convolution (or filter) takes a small portion of the input image and applies a filter that reduces the dimensionality of the input to a smaller matrix. This step is performed over many regions of the input image. The next step is max-pooling, which similarly reduces the dimensionality of the image by returning the maximum value for the given input matrix (input from the convolution layer). This process repeats for some number of layers until we reach the classification layer. The final max-pooling layer feeds a neural network that is fully connected (each output class feeds from the outputs of the final max-pooling layer). The classification layer takes the high-level features to determine the class for the given input image.

Figure 5: The Convolutional Neural Network

Back-propagation is commonly used to train the CNN through supervised learning (adjusting the weights of the network as a function of the classification error).

5.3 Long Short-Term Memory Networks
LSTM networks are RNNs with internal memory. An LSTM block is used to build a network that exists in one dimension (as shown in Figure 6) or in multiple dimensions, where each block feeds blocks to the right and above. The LSTM block includes three gates that regulate the flow of information inside the block. These gates are the input gate, which controls how new information flows into the block; the forget gate, which controls when the internal memory is purged; and the output gate, which is used to compute the output of the block. The connections between blocks and the gates are weighted, making them adjustable during training.

Figure 6: Long Short-Term Memory Network and Block

LSTM networks, as RNNs, are ideal for sequence problems. One of the most interesting applications of LSTM networks was in the construction of human language description for an image (fed from a CNN). LSTM networks can be trained using back-propagation through time with supervised learning.

Learn More: Top 10 Python Libraries for Machine Learning

6. Deep Learning Frameworks

Just as there is a diversity of deep neural network architectures to apply, a spectrum of deep learning frameworks can be used to train and deploy the solution. These frameworks also commonly provide libraries and tools that can be used throughout the development cycle. For example, you’ll find tools to cleanse and prepare your data for training and validation as well as tools to audit your model in production. These frameworks also provide in-depth tutorials to help get you up to speed quickly so that you can build and deploy your deep learning solution.

Table 1 lists some of the major frameworks and the deep learning architectures that can be built from them.

Table 1: Deep Learning Frameworks

Framework
Description Languages
TensorFlow The leading framework with the largest user-base; includes support for all leading network architectures along with server and mobile device support Python, C++, R
CAFFE Popular framework, with a focus on CNN and RNN architectures; includes a “Model Zoo” of pretrained solutions that can be used immediately C, C++, Python, MATLAB
PyTorch A port of the Torch framework that runs directly on Python; supports a variety of deep neural network models, including CNNs, RNNs, generative adversarial networks, and LSTM networks Python, C, C++, Java
Keras An open source neural network library written in Python, supporting standard neural network architectures and CNNs/RNNs. Python

7. Advancements in Deep Learning

Deep Learning has advanced the state-of-the-art in a range of business problems. In one important area, CNNs have achieved the accuracy of board-certified dermatologists in classifying skin lesion images as benign or malignant. But as deep learning architectures evolve, along with their methods, they will be applied to new problems and advance further.

One key problem in deep learning is training. GPUs have reduced the time required to train deep neural networks, but as new problem domains are entered, more complex architectures requiring larger data sets are also found. Hardware continues to be applied to these areas, with new innovations in GPUs to focus on deep learning tasks. On the surface, increasing GPU core counts and memory bandwidth have a positive benefit to deep neural network training. New technologies to allow GPUs to communicate with one another directly instead of through their host system have also shown benefits for certain training workloads.

Where Deep Learning has commonly been restricted to servers with high-performance, multicore processors and GPUs, it is now extending its reach into small embedded devices like smartphones. This evolution is advancing on two fronts: modifications to deep neural network algorithms to make them friendlier to the resource-constrained embedded environment and custom processors that optimize these technologies in a more power-efficient way.

A recent study found that training a moderately sized deep neural network for an NLP application resulted in the same amount of carbon dioxide emission as five average U.S. vehicles over their lifetime. One important topic of research to combat this is called transfer learning. Transfer Learning involves use of a pretrained deep neural network for a similar problem area. In this way, the neural network is not required to train from the ground up; instead, it gets a head start, and then tweaks the network for the specific problem domain. This method has shown great promise in reducing training times in addition to minimizing the amount of new data required for training.

Learn More: 15 Best Machine Learning Books for 2020

8. Conclusion

Deep Learning has moved beyond buzzword and is being operationalized by businesses of all sizes. Deep Learning frameworks are also making these architectures more accessible and popular. The next wave of innovation in Deep Learning will come from next-gen processors that provide the necessary frameworks and can parse ‘high-level’ machine learning code.

Were you able to understand what Deep Learning is? Comment below or let us know on LinkedInOpens a new window , TwitterOpens a new window , or FacebookOpens a new window . We’d love to hear from you!

MORE ON AI AND DEEP LEARNING