eBook - ePub

Reinforcement Learning Algorithms with Python

Name: Reinforcement Learning Algorithms with Python
Author: Andrea Lonza

Learn, understand, and develop smart algorithms for addressing AI challenges

Andrea Lonza,

366 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Reinforcement Learning Algorithms with Python

Learn, understand, and develop smart algorithms for addressing AI challenges

Andrea Lonza,

Book details

Book preview

Table of contents

Citations

About This Book

Develop self-learning algorithms and agents using TensorFlow and other Python tools, frameworks, and libraries

Key Features

Learn, develop, and deploy advanced reinforcement learning algorithms to solve a variety of tasks
Understand and develop model-free and model-based algorithms for building self-learning agents
Work with advanced Reinforcement Learning concepts and algorithms such as imitation learning and evolution strategies

Book Description

Reinforcement Learning (RL) is a popular and promising branch of AI that involves making smarter models and agents that can automatically determine ideal behavior based on changing requirements. This book will help you master RL algorithms and understand their implementation as you build self-learning agents.

Starting with an introduction to the tools, libraries, and setup needed to work in the RL environment, this book covers the building blocks of RL and delves into value-based methods, such as the application of Q-learning and SARSA algorithms. You'll learn how to use a combination of Q-learning and neural networks to solve complex problems. Furthermore, you'll study the policy gradient methods, TRPO, and PPO, to improve performance and stability, before moving on to the DDPG and TD3 deterministic algorithms. This book also covers how imitation learning techniques work and how Dagger can teach an agent to drive. You'll discover evolutionary strategies and black-box optimization techniques, and see how they can improve RL algorithms. Finally, you'll get to grips with exploration approaches, such as UCB and UCB1, and develop a meta-algorithm called ESBAS.

By the end of the book, you'll have worked with key RL algorithms to overcome challenges in real-world applications, and be part of the RL research community.

What you will learn

Develop an agent to play CartPole using the OpenAI Gym interface
Discover the model-based reinforcement learning paradigm
Solve the Frozen Lake problem with dynamic programming
Explore Q-learning and SARSA with a view to playing a taxi game
Apply Deep Q-Networks (DQNs) to Atari games using Gym
Study policy gradient algorithms, including Actor-Critic and REINFORCE
Understand and apply PPO and TRPO in continuous locomotion environments
Get to grips with evolution strategies for solving the lunar lander problem

Who this book is for

If you are an AI researcher, deep learning user, or anyone who wants to learn reinforcement learning from scratch, this book is for you. You'll also find this reinforcement learning book useful if you want to learn about the advancements in the field. Working knowledge of Python is necessary.

Frequently asked questions

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Yes, you can access Reinforcement Learning Algorithms with Python by Andrea Lonza in PDF and/or ePUB format, as well as other popular books in Computer Science & Artificial Intelligence (AI) & Semantics. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Packt Publishing

Year

2019

ISBN

9781789139709

Edition

Topic

Computer Science

Subtopic

Artificial Intelligence (AI) & Semantics

Index

Computer Science

Section 1: Algorithms and Environments

This section is an introduction to reinforcement learning. It includes building the theoretical foundation and setting up the environment that is needed in the upcoming chapters.

This section includes the following chapters:

Chapter 1, The Landscape of Reinforcement Learning
Chapter 2, Implementing RL Cycle and OpenAI Gym
Chapter 3, Solving Problems with Dynamic Programming

The Landscape of Reinforcement Learning

Humans and animals learn through a process of trial and error. This process is based on our reward mechanisms that provide a response to our behaviors. The goal of this process is to, through multiple repetitions, incentivize the repetition of actions which trigger positive responses, and disincentivize the repetition of actions which trigger negative ones. Through the trial and error mechanism, we learn to interact with the people and world around us, and pursue complex, meaningful goals, rather than immediate gratification.

Learning through interaction and experience is essential. Imagine having to learn to play football by only looking at other people playing it. If you took to the field to play a football match based on this learning experience, you would probably perform incredibly poorly.

This was demonstrated throughout the mid-20th century, notably by Richard Held and Alan Hein's 1963 study on two kittens, both of whom were raised on a carousel. One kitten was able to move freely (actively), whilst the other was restrained and moved following the active kitten (passively). Upon both kittens being introduced to light, only the kitten who was able to move actively developed a functioning depth perception and motor skills, whilst the passive kitten did not. This was notably demonstrated by the absence of the passive kitten's blink-reflex towards incoming objects. What this, rather crude experiment demonstrated is that regardless of visual deprivation, physical interaction with the environment is necessary in order for animals to learn.

Inspired by how animals and humans learn, reinforcement learning (RL) is built around the idea of trial and error from active interactions with the environment. In particular, with RL, an agent learns incrementally as it interacts with the world. In this way, it's possible to train a computer to learn and behave in a rudimentary, yet similar way to how humans do.

This book is all about reinforcement learning. The intent of the book is to give you the best possible understanding of this field with a hands-on approach. In the first chapters, you'll start by learning the most fundamental concepts of reinforcement learning. As you grasp these concepts, we'll start developing our first reinforcement learning algorithms. Then, as the book progress, you'll create more powerful and complex algorithms to solve more interesting and compelling problems. You'll see that reinforcement learning is very broad and that there exist many algorithms that tackle a variety of problems in different ways. Nevertheless, we'll do our best to provide you with a simple but complete description of all the ideas, alongside a clear and practical implementation of the algorithms.

To start with, in this chapter, you'll familiarize yourself with the fundamental concepts of RL, the distinctions between different approaches, and the key concepts of policy, value function, reward, and model of the environment. You'll also learn about the history and applications of RL.

The following topics will be covered in this chapter:

An introduction to RL
Elements of RL
Applications of RL

An introduction to RL

RL is an area of machine learning that deals with sequential decision-making, aimed at reaching a desired goal. An RL problem is constituted by a decision-maker called an Agent and the physical or virtual world in which the agent interacts, is known as the Environment. The agent interacts with the environment in the form of Action which results in an effect. As a result, the environment will feedback to the agent a new State and Reward. These two signals are the consequences of the action taken by the agent. In particular, the reward is a value indicating how good or bad the action was, and the state is the current representation of the agent and the environment. This cycle is shown in the following diagram:

In this diagram the agent is represented by PacMan that based on the current state of the environment, choose which action to take. Its behavior will influence the environment, like its position and that of the enemies, that will be returned by the environment in the form of a new state and the reward. This cycle is repeated until the game ends.

The ultimate goal of the agent is to maximize the total reward accumulated during
its lifetime. Let's simplify the notation: if

is the action at time

and

is the reward at time

, then the agent will take actions

, to maximize the sum of all rewards

To maximize the cumulative reward, the agent has to learn the best behavior in every situation. To do so, the agent has to optimize for a long-term horizon while taking care of every single action. In environments with many discrete or continuous states and actions, learning is difficult because the agent should be accountable for each situation. To make the problem harder, RL can have very sparse and delayed rewards, making the learning process more arduous.

To give an example of an RL problem while explaining the complexity of a sparse reward, consider the well-known story of two siblings, Hansel and Gretel. Their parents led them into the forest to abandon them, but Hansel, who knew of their intentions, had taken a slice of bread with him when they left the house and managed to leave a trail of breadcrumbs that would lead him and his sister home. In the RL framework, the agents are Hansel and Gretel, and the environment is the forest. A reward of +1 is obtained for every crumb of bread reached and a reward of +10 is acquired when they reach home. In this case, the denser the trail of bread, the easier it will be for the siblings to find their way home. This is because to go from one piece of bread to another, they have to explore a smaller area. Unfortunately, sparse rewards are far more common than dense rewards in the real world.

An important characteristic of RL is that it can deal with environments that are dynamic, uncertain, and non-deterministic. These qualities are essential for the adoption of RL in the real world. The following points are examples of how real-world problems can be reframed in RL settings:

Self-driving cars are a popular, yet difficult, concept to approach with RL. This is because of the many aspects to be taken into consideration while driving on the road (such as pedestrians, other cars, bikes, and traffic lights) and the highly uncertain environment. In this case, the self-driving car is the agent that can act on the steering wheel, accelerator, and brakes. The environment is the world around it. Obviously, the agent cannot be aware of the whole world around it, as it can only capture limited information via its sensors (for example, the camera, radar, and GPS). The goal of the self-driving car is to reach the destination in the minimum amount of time while following the rules of the road and without damaging anything. Consequently, the agent can receive a negative reward if a negative event occurs and a positive reward can be received in proportion to the driving time when the agent reaches its destination.
In the game of chess, the goal is to checkmate the opponent's piece. In an RL framework, the player is the agent and the environment is the current state of the board. The agent is allowed to move the game pieces according to their own way of moving. As a result of an action, the environment returns a positive or negative reward corresponding to a win or a loss for the agent. In all other situations, the reward is 0 and the next state is the state of the board after the opponent has moved. Unlike the self-driving car example, here, the environment state equals the agent state. In other words, the agent has a perfect view of the environment.

Comparing RL and supervised learning

RL and supervised learning are similar, yet different, paradigms to learn from data. Many problems can be tackled with both supervised learning and RL; however, in most cases, they are suited to solve different tasks.

Supervised learning learns to generalize from a fixed dataset with a limited amount of data consisting of examples. Each example is composed of the input and the desired output (or label) that provides immediate learning feedback.

In comparison, RL is more focused on sequential actions that you can take in a particular situation. In this case, the only supervision provided is the reward signal. There's no correct action to take in a circumstance, as in the supervised settings.

RL can be viewed as a more general and complete framework for learning. The major characteristics that are unique to RL are as follows:

The reward could be dense,...

Title Page
Copyright and Credits
Dedication
About Packt
Contributors
Preface
Section 1: Algorithms and Environments
The Landscape of Reinforcement Learning
Implementing RL Cycle and OpenAI Gym
Solving Problems with Dynamic Programming
Section 2: Model-Free RL Algorithms
Q-Learning and SARSA Applications
Deep Q-Network
Learning Stochastic and PG Optimization
TRPO and PPO Implementation
DDPG and TD3 Applications
Section 3: Beyond Model-Free Algorithms and Improvements
Model-Based RL
Imitation Learning with the DAgger Algorithm
Understanding Black-Box Optimization Algorithms
Developing the ESBAS Algorithm
Practical Implementation for Resolving RL Challenges
Assessments
Other Books You May Enjoy