The “deep” in deep learning refers to the number of hidden layers involved in the design. Deep learning is a way of training artificial intelligence (AI) to recognize specific data, such as speech or faces, and to make predictions based on previous experiences. Unlike machine learning, which organizes and sends data through predefined algorithms, deep learning develops and uses basic algorithms to screen the data, and then trains the AI entity to “learn on its own” by utilizing patterns, and “many” layers of processing.
Deep learning is the result of a paradigm shift that took place during the first “AI winter” (roughly 1970 to 1980). The winter provided a break, and a restart, in terms of thinking about artificial intelligence. After the first AI winter, machine learning, as a method for training artificial intelligence, was replaced with deep learning. Machine learning split away and became a separate practice.
LEARN HOW TO BUILD A DATA LITERACY PROGRAM
Developing Data Literacy is key to becoming a data-driven organization – take a look at our online courses to get started.
The first use of deep learning started in 1979, when Kunihiko Fukushima designed the “convolutional neural network.” He used systems that combined multiple pooling with convolutional layers to develop a neural network called Neocognitron. His novel design allowed computers to “learn” and develop the ability to recognize visual patterns. Fukushima’s models were trained using a reinforcement strategy with recurring activation at multiple layers, and which gained strength (weight) over time, as the pattern was repeated and reinforced.
The artificial neural network uses a collection of nodes that are connected and described as artificial neurons. The “connections” act as synapses, and function when an artificial neuron sends a signal to another neuron. The artificial neuron receiving the signal processes it, and then signals other artificial neurons connected to it. During the process, neurons use an activation function to “standardize” the data coming out of the neuron (output).
The connections (or synapses) between neurons are deliberately associated with a weight. This weight controls the importance and value of the input. The weights are set randomly, initially, and change with experience.
Deep learning uses thousands of artificial interconnected neurons, which are laid out in “multiple processing layers.” (Machine learning systems typically use two layers.) These multiple processing layers offer higher-level abstractions, better classifications, and more precise predictions. Deep learning provides an excellent tool for working with voice recognition, conversational skills, and big data.
Each layer of nodes/neurons trains using features coming from the previous layer’s output. As data advances through the neural net, more complex features can be recognized, since they aggregate and recombine features from the previous layer. Currently, neural networks come with three types of layers:
- The input layer receives data
- The hidden layers process data from the inputs
- The output layer provides responses and predictions
Neural networks are capable of learning in nonlinear ways, providing a significant advantage over earlier machine learning systems. This provides neural networks with the ability to locate subtle, potentially “confusing” features in an image (such as oranges on a tree, with some in sunlight, and others in shade). This “skill” is the result of using an activation layer, which is designed to exaggerate the “useful” details during the identification process.
Artificial Neural Networks
Artificial neural networks are computer systems based loosely on the design of neural networks in the human brain. Though not yet as efficient as organic, living brains, these artificial networks operate in similar ways. The systems learn through experience, similar to the way a living brain does. They learn to accomplish tasks by comparing samples, typically without specifically assigned goals.
One example would be image recognition, with neural networks training to identify images of dogs by viewing images tagged with “dog” or “no dog” labels, and using the results to identify dogs. Artificial neural networks begin at zero, with no data or understanding of a dog’s characteristics. Each system develops a base understanding of the relevant traits it is looking for.
Currently, there are six different types of neural networks. However, only two have gained a significant amount of popularity: recurrent and feedforward. Feedforward neural networks send data in a single direction, and are generally considered the most simple kind of neural network. Data is sent from the input nodes through the hidden nodes and into the output nodes. Feedforward neural networks do not use loops or cycles.
Recurrent neural networks, on the other hand, use the connections between nodes (synapses), and allow for data to flow “back and forth.” The recurrent neural network creates a directed cycle, which gets expressed as a “dynamic temporal behavior.” Basically, this means recurrent neural networks remember what they learned from previous inputs using a simple loop. The loop takes the data from the previous time stamp, then adds it to the current time stamp’s input. The recurrent neural network is capable of using its internal memory for processing the sequence of inputs. This form of neural network is very popular for comparing handwriting, and for speech recognition.
Deep Learning Algorithms
Algorithms, with a cascade of layers for nonlinear processing units, are commonly used in deep learning. Each layer uses output coming from the previous layer as input. Deep learning also includes multiple levels of representations that correspond to varying levels of abstraction. These levels develop into a hierarchy of concepts.
An algorithm referred to as “feature extraction” provides another aspect of deep learning. This automatically builds meaningful “features” for learning and understanding. Training an AI entity in feature extraction requires three different kinds of samples, called “target,” “non-target,” and “confusers.” The target image, for example, a car, is shown in several photos. The non-target image shows no cars, and the confusers are images that might confuse the AI entity.
Deep learning training techniques have advanced the ability of AI entities to detect, recognize, categorize, and describe. A number of advancements taking place in the field of deep learning include:
- Algorithmic discoveries have improved the performance of deep learning techniques.
- New approaches that have improved the recognition skills of various AIs.
- New kinds of neural networks that work well for applications, such as image classification and text translation.
- Significantly more data available for building neural networks using many deep layers.
- Using graphics processing units, combined with the cloud, to make incredible computing power available for deep learning tactics.
The Advantages of Deep Learning
Deep learning and the use of neural networks currently offer the best solutions for many of the problems that come up in image recognition, speech recognition, and natural language processing. Deep learning networks can also be applied successfully to big data, knowledge application, and for making predictions.
Deep learning reduces the need for feature “engineering,” which is a very time-consuming process in the machine learning industry. Additionally, its architecture can be adjusted relatively easily to deal with new problems. Dealing with languages, vision, and time series issues requires techniques, such as recurrent neural networks, convolutional neural networks, and Long Short-Term Memory (LSTM) to process data.
Most algorithms working with sequential data come with a memory that includes the last 10-time steps. However, Long Short-Term Memory (invented by Jürgen Schmidhuber and Sepp Hochreiter in 1997) does not have the same limitations. It works with recurrent neural networks and lets the network pick up on the activity of hundreds of time-steps from the past and uses them to make more accurate predictions. LSTM networks have been generally ignored for the past 10 years, but their use continues to grow, and is one of the reasons deep learning has become so successful.
Image used under license from Shutterstock.com