PyTorch 1.x Reinforcement Learning Cookbook
eBook - ePub

PyTorch 1.x Reinforcement Learning Cookbook

Over 60 recipes to design, develop, and deploy self-learning AI models using Python

Yuxi (Hayden) Liu

  1. 340 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

PyTorch 1.x Reinforcement Learning Cookbook

Over 60 recipes to design, develop, and deploy self-learning AI models using Python

Yuxi (Hayden) Liu

Book details
Book preview
Table of contents
Citations

About This Book

Implement reinforcement learning techniques and algorithms with the help of real-world examples and recipes

Key Features

  • Use PyTorch 1.x to design and build self-learning artificial intelligence (AI) models
  • Implement RL algorithms to solve control and optimization challenges faced by data scientists today
  • Apply modern RL libraries to simulate a controlled environment for your projects

Book Description

Reinforcement learning (RL) is a branch of machine learning that has gained popularity in recent times. It allows you to train AI models that learn from their own actions and optimize their behavior. PyTorch has also emerged as the preferred tool for training RL models because of its efficiency and ease of use.

With this book, you'll explore the important RL concepts and the implementation of algorithms in PyTorch 1.x. The recipes in the book, along with real-world examples, will help you master various RL techniques, such as dynamic programming, Monte Carlo simulations, temporal difference, and Q-learning. You'll also gain insights into industry-specific applications of these techniques. Later chapters will guide you through solving problems such as the multi-armed bandit problem and the cartpole problem using the multi-armed bandit algorithm and function approximation. You'll also learn how to use Deep Q-Networks to complete Atari games, along with how to effectively implement policy gradients. Finally, you'll discover how RL techniques are applied to Blackjack, Gridworld environments, internet advertising, and the Flappy Bird game.

By the end of this book, you'll have developed the skills you need to implement popular RL algorithms and use RL techniques to solve real-world problems.

What you will learn

  • Use Q-learning and the state–action–reward–state–action (SARSA) algorithm to solve various Gridworld problems
  • Develop a multi-armed bandit algorithm to optimize display advertising
  • Scale up learning and control processes using Deep Q-Networks
  • Simulate Markov Decision Processes, OpenAI Gym environments, and other common control problems
  • Select and build RL models, evaluate their performance, and optimize and deploy them
  • Use policy gradient methods to solve continuous RL problems

Who this book is for

Machine learning engineers, data scientists and AI researchers looking for quick solutions to different reinforcement learning problems will find this book useful. Although prior knowledge of machine learning concepts is required, experience with PyTorch will be useful but not necessary.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is PyTorch 1.x Reinforcement Learning Cookbook an online PDF/ePUB?
Yes, you can access PyTorch 1.x Reinforcement Learning Cookbook by Yuxi (Hayden) Liu in PDF and/or ePUB format, as well as other popular books in Computer Science & Programming in Python. We have over one million books available in our catalogue for you to explore.

Information

Year
2019
ISBN
9781838553234
Edition
1

Markov Decision Processes and Dynamic Programming

In this chapter, we will continue our practical reinforcement learning journey with PyTorch by looking at Markov decision processes (MDPs) and dynamic programming. This chapter will start with the creation of a Markov chain and an MDP, which is the core of most reinforcement learning algorithms. You will also become more familiar with Bellman equations by practicing policy evaluation. We will then move on and apply two approaches to solving an MDP: value iteration and policy iteration. We will use the FrozenLake environment as an example. At the end of the chapter, we will demonstrate how to solve the interesting coin-flipping gamble problem with dynamic programming step by step.
The following recipes will be covered in this chapter:
  • Creating a Markov chain
  • Creating an MDP
  • Performing policy evaluation
  • Simulating the FrozenLake environment
  • Solving an MDP with a value iteration algorithm
  • Solving an MDP with a policy iteration algorithm
  • Solving the coin-flipping gamble problem

Technical requirements

You will need the following programs installed on your system to successfully execute the recipes in this chapter:
  • Python 3.6, 3.7, or above
  • Anaconda
  • PyTorch 1.0 or above
  • OpenAI Gym

Creating a Markov chain

Let's get started by creating a Markov chain, on which the MDP is developed.
A Markov chain describes a sequence of events that comply with the Markov property. It is defined by a set of possible states, S = {s0, s1, ... , sm}, and a transition matrix, T(s, s'), consisting of the probabilities of state s transitioning to state s'. With the Markov property, the future state of the process, given the present state, is conditionally independent of past states. In other words, the state of the process at t+1 is dependent only on the state at t. Here, we use a process of study and sleep as an example and create a Markov chain based on two states, s0 (study) and s1 (sleep). Let's say we have the following transition matrix:
In the next section, we will compute the transition matrix after k steps, and the probabilities of being in each state given an initial distribution of states, such as [0.7, 0.3], meaning there is a 70% chance that the process starts with study and a 30% chance that it starts with sleep.

How to do it...

To create a Markov chain for the study - and - sleep process and conduct some analysis on it, perform the following steps:
  1. Import the library and define the transition matrix:
>>> import torch
>>> T = torch.tensor([[0.4, 0.6],
... [0.8, 0.2]])
  1. Calculate the transition probability after k steps. Here, we use k = 2, 5, 10, 15, and 20 as examples:
>>> T_2 = torch.matrix_power(T, 2)
>>> T_5 = torch.matrix_power(T, 5)
>>> T_10 = torch.matrix_power(T, 10)
>>> T_15 = torch.matrix_power(T, 15)
>>> T_20 = torch.matrix_power(T, 20)
  1. Define the initial distribution of two states:
>>> v = torch.tensor([[0.7, 0.3]])
  1. Calculate the state distribution after k = 1, 2, 5, 10, 15, and 20 steps:
>>> v_1 = torch.mm(v, T)
>>> v_2 = torch.mm(v, T_2)
>>> v_5 = torch.mm(v, T_5)
>>> v_10 = torch.mm(v, T_10)
>>> v_15 = torch.mm(v, T_15)
>>> v_20 = torch.mm(v, T_20)

How it works...

In Step 2, we calculated the transition probability after k steps, which is the kth power of the transition matrix. You will see the following output:
>>> print("Transition probability after 2 steps:\n{}".format(T_2))
Transition probability after 2 steps:
tensor([[0.6400, 0.3600],
[0.4800, 0.5200]])
>>> print("Transition probability after 5 steps:\n{}".format(T_5))
Transition probability after 5 steps:
tensor([[0.5670, 0.4330],
[0.5773, 0.4227]])
>>> print(
"Transition probability after 10 steps:\n{}".format(T_10))
Transition probability after 10 steps:
tensor([[0.5715, 0.4285],
[0.5714, 0.4286]])
>>> print(
"Transition probability after 15 steps:\n{}".format(T_15))
Transition probability after 15 steps:
tensor([[0.5714, 0.4286],
[0.5714, 0.4286]])
>>> print(
"Transition probability after 20 steps:\n{}".format(T_20))
Transition probability after 20 steps:
tensor([[0.5714, 0.4286],
[0.5714, 0.4286]])
We can see that, after 10 to 15 steps, the transition probability converges. This means that, no matter what state the process is in, it has the same probability of transitioning to s0 (57.14%) and s1 (42.86%).
In Step 4, we calculated the state distribution after k = 1, 2, 5, 10, 15, and 20 steps, which is the multiplication of the initial state distribution and the transition probability. You can see the results here:
>>> print("Distribution of states after 1 step:\n{}".format(v_1))
Distribution of states after 1 step:
tensor([[0.5200, 0.4800]])
>>> print("Distribution of states after 2 steps:\n{}".format(v_2))
Distribution of states after 2 steps:
tensor([[0.5920, 0.4080]])
>>> print("Distribution of states after 5 steps:\n{}".format(v_5))
Distribution of states after 5 steps:
te...

Table of contents

  1. Title Page
  2. Copyright and Credits
  3. About Packt
  4. Contributors
  5. Preface
  6. Getting Started with Reinforcement Learning and PyTorch
  7. Markov Decision Processes and Dynamic Programming
  8. Monte Carlo Methods for Making Numerical Estimations
  9. Temporal Difference and Q-Learning
  10. Solving Multi-armed Bandit Problems
  11. Scaling Up Learning with Function Approximation
  12. Deep Q-Networks in Action
  13. Implementing Policy Gradients and Policy Optimization
  14. Capstone Project – Playing Flappy Bird with DQN
  15. Other Books You May Enjoy
Citation styles for PyTorch 1.x Reinforcement Learning Cookbook

APA 6 Citation

Liu, Y. (2019). PyTorch 1.x Reinforcement Learning Cookbook (1st ed.). Packt Publishing. Retrieved from https://www.perlego.com/book/1204181/pytorch-1x-reinforcement-learning-cookbook-over-60-recipes-to-design-develop-and-deploy-selflearning-ai-models-using-python-pdf (Original work published 2019)

Chicago Citation

Liu, Yuxi. (2019) 2019. PyTorch 1.x Reinforcement Learning Cookbook. 1st ed. Packt Publishing. https://www.perlego.com/book/1204181/pytorch-1x-reinforcement-learning-cookbook-over-60-recipes-to-design-develop-and-deploy-selflearning-ai-models-using-python-pdf.

Harvard Citation

Liu, Y. (2019) PyTorch 1.x Reinforcement Learning Cookbook. 1st edn. Packt Publishing. Available at: https://www.perlego.com/book/1204181/pytorch-1x-reinforcement-learning-cookbook-over-60-recipes-to-design-develop-and-deploy-selflearning-ai-models-using-python-pdf (Accessed: 14 October 2022).

MLA 7 Citation

Liu, Yuxi. PyTorch 1.x Reinforcement Learning Cookbook. 1st ed. Packt Publishing, 2019. Web. 14 Oct. 2022.