Loading...
You are here:  Home  >  Data Education  >  BI / Data Science News, Articles, & Education  >  BI / Data Science Articles  >  Current Article

The Fundamentals of Deep Reinforcement Learning

By   /  June 27, 2018  /  No Comments

Deep Reinforcement LearningReinforcement Learning (RL), a “niche” Machine Learning technique, has surfaced in the last five years. In context-based decision making, Reinforcement Learning helps the machine take action-provoking decision making through a trial-and-error approach to achieve the optimal algorithmic model for a situation.

Furthermore, the machine is trained through a reward/penalty-based feedback mechanism, the goal of which is to continuously improve the behavioral aspects of a machine or robot. RL is widely used across industry sectors like energy, transportation, finance, or healthcare, where automation involving multiple digital agents are concerned. Reinforcement Learning is currently being used to train systems in gaming or robotics, Natural Language Processing, or computer vision. Deep Reinforcement Learning (DRL) is a sub-field of RL, as Ruben Glatt explains in Quora. DRL helps tackle some of the limitations of traditional RL.

What is Deep Reinforcement Learning?

Let’s begin with the terminology. For those unfamiliar with concepts such as “agent,” “state,” “action,” “rewards,” and “environment,” the article The Very Basics of Reinforcement Learning explains the basic nuts and bolts of Reinforcement Learning and Deep Reinforcement Learning. The guiding principles around these concepts — which form the “policy” — are also described in an easily digestible format.

Another introductory guide to DRL,  A Beginner’s Guide to Deep Reinforcement Learning, points out that RL is best understood in an environment marked by states, agents, action, and rewards. The environment can take an agent’s “current state and action” as input, and then return the output in the form of “rewards” or “penalties” to encourage positive behavioral learning. This guide describes how the environment acts as “a systematic guiding light” for accepting “actions” as inputs and outputting “rewards” or “penalties” to continuously improve machine decision-making.

In Forrester’s Artificial Intelligence Report Spawns 10 Hot Technologies the author synthesizes the findings of a Forrester Report on Artificial Intelligence technologies. In this post, Andrew Nicholas, Head of Procurement at Tungsten Network, mentions that advanced Machine Learning (ML) algorithms have the potential to revolutionize “lifecycle management” in procurement businesses by enabling the machines to learn directly from the available data rather than depending on procedural rules set by human programmers. This post stresses the need for strong networking between machines and business practitioners.

The InfoWorld article What is Deep Reinforcement Learning: The Next Step in AI and Deep Learning mentions that Reinforcement Learning is best suited for aiding decision-making in both supervised and unsupervised learning methods. Another fascinating application of Reinforcement Learning is found in edge applications, where robotics is combined with “contextual autonomy” to drive the humanized machines.

Are Experts Skeptical about the Future of Deep Reinforcement Learning?

The status of research on Reinforcement Learning is currently at a junction of “theory” and “experimental practice.” Researchers have attempted to prove that RL and DRL are particularly useful for use cases where a “model of the world” is unavailable. However, it is also well-known that situation-specific Machine Learning algorithms work better in most cases than world models of RL. Currently, AlphaGo is a convincing “proof of concept” for deep RL.

Himanshu Sahni’s post Reinforcement Learning Never Worked, and ‘Deep’ Only Helped a Bit, refers to a book on RL with many examples unique to Reinforcement Learning. The author indicates that in those problems where supervised, unsupervised, or deep learning fails, RL or DRL can probably help develop general models of the given problem.

The question is, can “general models” work in highly domain-specific problems? One huge limitation of general models is that during development users assume “an infinite number of agents with an infinite number states and actions have been tried an infinite number of times.” In reality, such experimentations may not be possible.

Take the example of a robotics, where a robot is coached about “right actions” over an extended period of time before it gets the action right. Thus, implicitly, the idea of exploration is tied up with “extended rewards.”

How to Make Deep Reinforcement Learning Work

The general belief is that, given sufficient time, advanced ML researchers will succeed in making Reinforcement Learning and Deep Reinforcement Learning work in actual contextual environments. The article Deep Reinforcement Learning Doesn’t Work Yet offers some highly introspective and well-researched ideas which can make RL and DRL a practitioner’s science. The author has cited plausible futures:

  • Make rewards more extensive to make them universally applicable.
  • Work on advanced hardware systems, with more speed and processing power.
  • Take a model-based approach to teach machines (AlphaGo is a successful model).
  • RL should be used to tune “supervised” or “unsupervised” learning rather than replacing the traditional techniques.
  • Techniques like Imitation learning and inverse reinforcement learning may be used to improve reward functions.
  • Transfer learning is currently uncertain, but is the future.
  • Build on prior experiences.

Some Popular Applications of DRL

Tried and tested use cases of Deep Reinforcement Learning techniques include:

  • Digital Assistants who interact with customers by using text summaries and speech samples, and improve with time;
  • Optimal policy development through trial and error methods in insurance or healthcare;
  • Training online agents to guide stock trading.

The Forbes post How Deep Reinforcement Learning Will Make Robots Smarter provides a description of DRL training techniques as used in Robotics. The author of the post compares the training process of a robot to the learning process of a small child. In DRL, the robot is rewarded for positive behavior, and penalized for negative behavior, which is very similar to the way humans train children. This post provides a convincing tale of “positive reinforcement learning,” which has already been put to practice by Google and some other tech giants.

The Not so Known Facts about DRL

Here is a quick introduction to some of the unique features of Reinforcement Learning and Deep Reinforcement Learning:

  • RL and DRL are basically advanced ML techniques, which enable “agents” to learn through interactive trail-and-error “actions” using feedback generated during past actions.
  • In both supervised learning and DRL, input and output are compared before the “feedback” is generated.
  • The most marked difference between the feedback mechanism of “supervised” learning and that of RL or DRL is that in case of supervised learning, feedback comes as the correct action steps, while in the case of RL or DRL, feedback comes in the form of “rewards” or “penalties.” In that sense, DRL encourages behavioral changes instead of offering basic guidance.
  • The end goal of unsupervised learning is to determine similarities and dissimilarities between different data points, while in RL or DRL, the end goal is to determine a model course of action to maximize the rewards.

The KD Nugget post 5 Things You Need to Know about Reinforcement Learning explains some lesser-known truths about RL or DRL.

Recent Breakthroughs in the Field of Deep Reinforcement Learning

In 2017, ML researchers invested much time and effort to offer the ultimate gift to machines — a “mind.” Thus, that year saw the triumph of machines beating humans in their own games and machine-generated art.

Here is a quick recap of some of the best discoveries in the AI world, which encapsulates Machine Learning, Deep Learning, Reinforcement Learning, and Deep Reinforcement Learning:

  • A game-development company launched a new platform to train digital agents through DRL-enabled custom environments.
  • The Universe platform can train any robotic agent across multiple digital channels.

The Forbes blog post 12 Amazing Deep Learning Breakthroughs of 2017 gives the full story. The DATAVERSITY® article Deep Learning Updates: Machine Learning, Deep Reinforcement Learning, and Limitations explores the extent in which AI and related technologies have recently contributed to the development of machines with human behavioral qualities.

Interested in Deep Reinforcement Learning?

Working Deep Reinforcement Learning platforms include RoboschoolDeepMind Lab, and OpenAI Gym. For an overview of advanced ML practices used in the industry, review Smart Data Webinar: Machine Learning Update – An Overview of Technology Maturity.

 

Photo Credit: MY stock/Shutterstock.com

About the author

Paramita Ghosh has over two and a half decades of business writing experience, much of which has been writing for technology and business domains. She has written extensively for a broad range of industries, including but not limited to data management and data technologies. Paramita has also contributed to blended learning projects. She received her M.A. degree in English Literature in 1984 from Jadavpur University in India, and embarked on her career in the United States in 1989 after completing professional coursework. Having ghostwritten and authored hundreds of articles, blog posts, white papers, case studies, marketing content, and learning modules, Paramita has included authorship of one or two books on the business of business writing as part of her post-retirement projects. She thinks her professional strength is “lifelong learning.”

You might also like...

Disrupting Metadata Management with Metadata Automation

Read More →
We use technologies such as cookies to understand how you use our site and to provide a better user experience. This includes personalizing content, using analytics and improving site operations. We may share your information about your use of our site with third parties in accordance with our Privacy Policy. You can change your cookie settings as described here at any time, but parts of our site may not function correctly without them. By continuing to use our site, you agree that we can save cookies on your device, unless you have disabled cookies.
I Accept