Deep Reinforcement Learning: What, Why, How

Reinforcement learning (RL), a “niche” machine learning technique, has surfaced in recent years. In context-based decision-making, reinforcement learning helps the machine take action-provoking decision-making through a trial-and-error approach to achieve the optimal algorithmic model for a situation.

Furthermore, the machine is trained through a reward/penalty-based feedback mechanism, the goal of which is to continuously improve the behavioral aspects of a machine or robot. RL is widely used across industry sectors like energy, transportation, finance, or healthcare, where automation involving multiple digital agents is concerned. Reinforcement learning is currently being used to train systems in gaming or robotics, natural language processing, or computer vision. Deep reinforcement learning (DRL) is a sub-field of RL. DRL helps tackle some of the limitations of traditional RL.

What Is Deep Reinforcement Learning?

Let’s begin with the terminology. For those unfamiliar with concepts such as “agent,” “state,” “action,” “rewards,” and “environment,” the article The Very Basics of Reinforcement Learning explains the basic nuts and bolts of reinforcement learning and deep reinforcement learning. The guiding principles around these concepts – which form the “policy” – are also described in an easily digestible format.

Reinforcement learning is best understood in an environment marked by states, agents, action, and rewards. The environment can take an agent’s “current state and action” as input, and then return the output in the form of “rewards” or “penalties” to encourage positive behavioral learning. This guide describes how the environment acts as “a systematic guiding light” for accepting “actions” as inputs and outputting “rewards” or “penalties” to continuously improve machine decision-making.

The InfoWorld article What is Deep Reinforcement Learning: The Next Step in AI and Deep Learning mentions that Reinforcement Learning is best suited for aiding decision-making in both supervised and unsupervised learning methods. Another fascinating application of Reinforcement Learning is found in edge applications, where robotics is combined with “contextual autonomy” to drive the humanized machines.

Are Experts Skeptical about the Future of Deep Reinforcement Learning?

The status of research on Reinforcement Learning is currently at a junction of “theory” and “experimental practice.” Researchers have attempted to prove that RL and DRL are particularly useful for use cases where a “model of the world” is unavailable. However, it is also well-known that situation-specific Machine Learning algorithms work better in most cases than world models of RL. Currently, AlphaGo is a convincing “proof of concept” for deep RL.

Himanshu Sahni’s post Reinforcement Learning Never Worked, and ‘Deep’ Only Helped a Bit, refers to a book on RL with many examples unique to Reinforcement Learning. The author indicates that in those problems where supervised, unsupervised, or deep learning fails, RL or DRL can probably help develop general models of the given problem.

The question is, can “general models” work in highly domain-specific problems? One huge limitation of general models is that during development users assume “an infinite number of agents with an infinite number states and actions have been tried an infinite number of times.” In reality, such experimentations may not be possible.

Take the example of a robotics, where a robot is coached about “right actions” over an extended period of time before it gets the action right. Thus, implicitly, the idea of exploration is tied up with “extended rewards.”

How to Make Deep Reinforcement Learning Work

The general belief is that, given sufficient time, advanced ML researchers will succeed in making Reinforcement Learning and Deep Reinforcement Learning work in actual contextual environments. The article Deep Reinforcement Learning Doesn’t Work Yet offers some highly introspective and well-researched ideas which can make RL and DRL a practitioner’s science. The author has cited plausible futures:

Make rewards more extensive to make them universally applicable.
Work on advanced hardware systems, with more speed and processing power.
Take a model-based approach to teach machines (AlphaGo is a successful model).
RL should be used to tune “supervised” or “unsupervised” learning rather than replacing the traditional techniques.
Techniques like Imitation learning and inverse reinforcement learning may be used to improve reward functions.
Transfer learning is currently uncertain, but is the future.
Build on prior experiences.

Some Popular Applications of DRL

Tried and tested use cases of Deep Reinforcement Learning techniques include:

Digital Assistants who interact with customers by using text summaries and speech samples, and improve with time;
Optimal policy development through trial and error methods in insurance or healthcare;
Training online agents to guide stock trading.

The Forbes post How Deep Reinforcement Learning Will Make Robots Smarter provides a description of DRL training techniques as used in Robotics. The author of the post compares the training process of a robot to the learning process of a small child. In DRL, the robot is rewarded for positive behavior, and penalized for negative behavior, which is very similar to the way humans train children. This post provides a convincing tale of “positive reinforcement learning,” which has already been put to practice by Google and some other tech giants.

The Not so Known Facts about DRL

Here is a quick introduction to some of the unique features of Reinforcement Learning and Deep Reinforcement Learning:

RL and DRL are basically advanced ML techniques, which enable “agents” to learn through interactive trail-and-error “actions” using feedback generated during past actions.
In both supervised learning and DRL, input and output are compared before the “feedback” is generated.
The most marked difference between the feedback mechanism of “supervised” learning and that of RL or DRL is that in case of supervised learning, feedback comes as the correct action steps, while in the case of RL or DRL, feedback comes in the form of “rewards” or “penalties.” In that sense, DRL encourages behavioral changes instead of offering basic guidance.
The end goal of unsupervised learning is to determine similarities and dissimilarities between different data points, while in RL or DRL, the end goal is to determine a model course of action to maximize the rewards.

The KD Nugget post 5 Things You Need to Know about Reinforcement Learning explains some lesser-known truths about RL or DRL.

Recent Breakthroughs in the Field of Deep Reinforcement Learning

Machine Learning researchers invested much time and effort to offer the ultimate gift to machines — a “mind.” Thus, that year saw the triumph of machines beating humans in their own games and machine-generated art.

Here is a quick recap of some of the best discoveries in the AI world, which encapsulates Machine Learning, Deep Learning, Reinforcement Learning, and Deep Reinforcement Learning:

A game-development company launched a new platform to train digital agents through DRL-enabled custom environments.
The Universe platform can train any robotic agent across multiple digital channels.

The Towards Data Science article, 14 Deep and Machine Learning Uses that made 2019 a new AI Age, discusses the newest advances in detail. The DATAVERSITY^® article Deep Learning and Analytics: What is the Intersection? explores the extent in which AI and related technologies have recently contributed to the development of machines with human behavioral qualities.

Interested in Deep Reinforcement Learning?

Working Deep Reinforcement Learning platforms include Roboschool, DeepMind Lab, and OpenAI Gym. For an overview of advanced ML practices used in the industry, review Smart Data Webinar: Machine Learning Update – An Overview of Technology Maturity.

Image Credit: Shutterstock.com

Data Topics