A Brief History of Neural Networks

By on

In the last few decades, neural networks have evolved from an academic curiosity into a vast “deep learning” industry. Deep learning uses neural networks, a data structure design loosely inspired by the layout of biological neurons. These neural networks are constructed in layers, and the inputs from one layer are connected to the outputs of the next layer. Deep learning is a subdivision of machine learning, and allows computers to automatically recognize faces and transcribe spoken words into text, and allows self-driving cars to avoid objects on the street.

These accomplishments were made possible by the works of Walter Pitts, a mathematician, and Warren McCulloch, a neurophysiologist, in 1943, and Donald Hebb, a Canadian psychologist. Pitts and McCulloch researched and wrote a paper describing how neurons might work, and then developed a simple neural network using electrical circuits.

Hebb expanded on their ideas, writing The Organization of Behaviour in 1949, where he proposed that neural pathways strengthen with each successive use. He wrote: “When one cell repeatedly assists in firing another, the axon of the first cell develops synaptic knobs (or enlarges them if they already exist) in contact with the soma of the second cell.”

Machine Learning and the Game of Checkers

In the 1950s, Arthur Samuel, an IBM researcher, developed the first computer program capable of playing checkers. This checker playing program was also the first primitive version of machine learning. Because his program had access to a very small computer memory, Samuel created alpha-beta pruning, a search algorithm that eliminates choices by selecting the “best choice” available. His design used a scoring function that evaluated the positions of the checkers, attempting to measure the chances each side had of winning. His program eventually evolved into the minimax algorithm.

The Perceptron

In 1957, Frank Rosenblatt, a researcher for Cornell Aeronautical Laboratory, merged Hebb’s model of brain cell activity with Samuel’s machine learning concepts and created the Mark 1 Perceptron. Using neuroscience experiments that had been performed in the 1940s, Rosenblatt created a model of how the brain’s neurons worked. 

The Perceptron Mark 1 was originally planned not as a program, but as a machine. It was a custom-built machine, though its software had been written for the IBM 704 (and designed for image recognition). This combination, quite fortunately, worked out, providing solid evidence algorithms and software could be transferred to, and used effectively in, other similar computers. Prior to this, software could not be transferred back and forth between similar computers. (It should be noted, Rosenblatt’s primary goal was not to build a computer that could recognize and classify images, but to gain insights about how the human brain worked.) The Perceptron neural network was originally programmed with two layers, the input layer and the output layer.

Early Efforts of Deep Learning

The first generalized, functional learning algorithm was configured for a multilayer Perceptron, and used a “feedforward design” (compare this to backpropagation in the next section). The work was published by Alexey Ivakhnenko and V. G. Lapa in 1967, in Cybernetics and Forecasting Techniques. In 1971, Ivakhnenko published a paper, “Polynomial Theory of Complex Systems,” describing a deep network that had eight layers and was trained by the group method.

Kunihiko Fukushima, working on computer vision, started developing the Neocognitron in 1979. He proposed a hierarchical, multilayered neural network. This was the first design of a deep learning model using a convolutional neural network. Fukushima’s design helped the computers learn to recognize and identify visual patterns. His design also allowed for the fine-tuning of significant features by manually adjusting the “weight” of the desired connections.


Although many early versions of backpropagation had been independently rediscovered in the 1960s, it was not implemented until the 1970s by Seppo Linnainmaa. Paul Werbos was first person in the United States to propose that backpropagation could be used for artificial neural networks, after researching it in depth for his 1974 dissertation. Backpropagation describes a method in which of errors are processed at the output (not the input), and then distributed backward, going through the system’s layers for learning and training. This technique has become a popular method for training deep neural networks.

Hidden Layers

The input layer is considered the first layer in a neural network, and the output layer is the last layer. The various layers between these two are generally referred to as the hidden layers. Each layer is normally made up of a simple algorithm supporting one activation function.

The early designs of neural networks (such as the Perceptron) did not include hidden layers, but two obvious ones (input/output). Not too surprisingly, these simplistic systems could not perform terribly complex functions. The two-layer, input-output design was a leftover of the Perceptron, and severely limited the computers’ abilities.

It wasn’t until the 1980s that researchers realized adding just a few hidden layers could significantly enhance the abilities of their neural networks. This realization led to the development of ANNs (artificial neural networks). Unlike the early Perceptrons, ANNs use hidden layers as they respond to complicated tasks. The more hidden layers within a neural network, the more time it takes for the neural network to produce the output, but this slowness results in more complex problems being solved. Hidden layers would become the foundation for deep learning.

Deep Learning Becomes a Reality

In 1989, deep learning became an actuality when Yann LeCun, et al., experimented with the standard backpropagation algorithm (created in 1970), applying it to a neural network. Their goal was to train the computer to recognize handwritten ZIP codes on mailed envelopes. This new system worked, and as a result, deep learning was born.

Deep learning is a subdivision of machine learning, and uses algorithms to process data as it attempts to imitate the human thinking process. Layers of algorithms are used to:

  • process data
  • understand human speech
  • visually recognize objects
  • do time-series analysis
  • diagnose medical issues

GPUs Offer Speed

Nvidia deserves the credit for creating the first “true” graphics processing unit (or GPU). It was released in 1999, and although there had been attempts in the 1980s to advance the quality of both output and video imagery (MDA cards and VGAs), they couldn’t compare to the GPU. Nvidia’s research provided computational speeds that increased by a thousand times during a 10 year span (and continue to increase). In 2009, Nvidia supported the “big bang of deep learning.” At this time, many successful deep learning neural networks received training using Nvidia GPUs.

GPUs have become remarkably important in machine learning. They use roughly 200 times more processors than are used by central processing units (CPUs). CPUs, while much more flexible, process data very slowly when compared to GPUs. To make up for the lack of flexibility, GPUs are designed for very specific uses. GPUs and CPUs differ in how they process data, with CPUs performing each procedure one at a time (which is slow), and GPUs running their various operations in parallel (or at the same time).

Deep learning algorithms are supported by neural networks. The training of a neural network is quite intense, computationally speaking, and because these computations/training can be parallelized (run simultaneously), they require a new approach in designing the hardware. Graphical processing units (GPUs) were originally designed for use in the gaming industry, and have a high number of processing cores. They also use very large on-board RAM (when compared to CPUs). GPUs, more and more, are being used in deep learning applications, with the effect of dramatically accelerating neural network training because they can run in parallel.

Artificial Intelligence and Neural Networks

When combined, artificial intelligence, neural networks, and deep learning present an incredibly exciting opportunity to solve a variety of real-world problems. Although a humanlike thinking and decision-making form of artificial intelligence is still several years away, there have been some truly remarkable steps forward in the evolution of artificial intelligence, with neural networks and their associated algorithms providing the foundation. Neural networks still have a great deal of room for growth and development. As they advance, it is reasonable to expect them to support advances in artificial intelligence, as well.

Image used under license from Shutterstock.com

Leave a Reply

We use technologies such as cookies to understand how you use our site and to provide a better user experience. This includes personalizing content, using analytics and improving site operations. We may share your information about your use of our site with third parties in accordance with our Privacy Policy. You can change your cookie settings as described here at any time, but parts of our site may not function correctly without them. By continuing to use our site, you agree that we can save cookies on your device, unless you have disabled cookies.
I Accept