PyTorch 1.x Reinforcement Learning Cookbook
eBook - ePub

PyTorch 1.x Reinforcement Learning Cookbook

Over 60 recipes to design, develop, and deploy self-learning AI models using Python

Yuxi (Hayden) Liu

  1. 340 Seiten
  2. English
  3. ePUB (handyfreundlich)
  4. Über iOS und Android verfügbar
eBook - ePub

PyTorch 1.x Reinforcement Learning Cookbook

Over 60 recipes to design, develop, and deploy self-learning AI models using Python

Yuxi (Hayden) Liu

Angaben zum Buch
Buchvorschau
Inhaltsverzeichnis
Quellenangaben

Über dieses Buch

Implement reinforcement learning techniques and algorithms with the help of real-world examples and recipes

Key Features

  • Use PyTorch 1.x to design and build self-learning artificial intelligence (AI) models
  • Implement RL algorithms to solve control and optimization challenges faced by data scientists today
  • Apply modern RL libraries to simulate a controlled environment for your projects

Book Description

Reinforcement learning (RL) is a branch of machine learning that has gained popularity in recent times. It allows you to train AI models that learn from their own actions and optimize their behavior. PyTorch has also emerged as the preferred tool for training RL models because of its efficiency and ease of use.

With this book, you'll explore the important RL concepts and the implementation of algorithms in PyTorch 1.x. The recipes in the book, along with real-world examples, will help you master various RL techniques, such as dynamic programming, Monte Carlo simulations, temporal difference, and Q-learning. You'll also gain insights into industry-specific applications of these techniques. Later chapters will guide you through solving problems such as the multi-armed bandit problem and the cartpole problem using the multi-armed bandit algorithm and function approximation. You'll also learn how to use Deep Q-Networks to complete Atari games, along with how to effectively implement policy gradients. Finally, you'll discover how RL techniques are applied to Blackjack, Gridworld environments, internet advertising, and the Flappy Bird game.

By the end of this book, you'll have developed the skills you need to implement popular RL algorithms and use RL techniques to solve real-world problems.

What you will learn

  • Use Q-learning and the state–action–reward–state–action (SARSA) algorithm to solve various Gridworld problems
  • Develop a multi-armed bandit algorithm to optimize display advertising
  • Scale up learning and control processes using Deep Q-Networks
  • Simulate Markov Decision Processes, OpenAI Gym environments, and other common control problems
  • Select and build RL models, evaluate their performance, and optimize and deploy them
  • Use policy gradient methods to solve continuous RL problems

Who this book is for

Machine learning engineers, data scientists and AI researchers looking for quick solutions to different reinforcement learning problems will find this book useful. Although prior knowledge of machine learning concepts is required, experience with PyTorch will be useful but not necessary.

Häufig gestellte Fragen

Wie kann ich mein Abo kündigen?
Gehe einfach zum Kontobereich in den Einstellungen und klicke auf „Abo kündigen“ – ganz einfach. Nachdem du gekündigt hast, bleibt deine Mitgliedschaft für den verbleibenden Abozeitraum, den du bereits bezahlt hast, aktiv. Mehr Informationen hier.
(Wie) Kann ich Bücher herunterladen?
Derzeit stehen all unsere auf Mobilgeräte reagierenden ePub-Bücher zum Download über die App zur Verfügung. Die meisten unserer PDFs stehen ebenfalls zum Download bereit; wir arbeiten daran, auch die übrigen PDFs zum Download anzubieten, bei denen dies aktuell noch nicht möglich ist. Weitere Informationen hier.
Welcher Unterschied besteht bei den Preisen zwischen den Aboplänen?
Mit beiden Aboplänen erhältst du vollen Zugang zur Bibliothek und allen Funktionen von Perlego. Die einzigen Unterschiede bestehen im Preis und dem Abozeitraum: Mit dem Jahresabo sparst du auf 12 Monate gerechnet im Vergleich zum Monatsabo rund 30 %.
Was ist Perlego?
Wir sind ein Online-Abodienst für Lehrbücher, bei dem du für weniger als den Preis eines einzelnen Buches pro Monat Zugang zu einer ganzen Online-Bibliothek erhältst. Mit über 1 Million Büchern zu über 1.000 verschiedenen Themen haben wir bestimmt alles, was du brauchst! Weitere Informationen hier.
Unterstützt Perlego Text-zu-Sprache?
Achte auf das Symbol zum Vorlesen in deinem nächsten Buch, um zu sehen, ob du es dir auch anhören kannst. Bei diesem Tool wird dir Text laut vorgelesen, wobei der Text beim Vorlesen auch grafisch hervorgehoben wird. Du kannst das Vorlesen jederzeit anhalten, beschleunigen und verlangsamen. Weitere Informationen hier.
Ist PyTorch 1.x Reinforcement Learning Cookbook als Online-PDF/ePub verfügbar?
Ja, du hast Zugang zu PyTorch 1.x Reinforcement Learning Cookbook von Yuxi (Hayden) Liu im PDF- und/oder ePub-Format sowie zu anderen beliebten Büchern aus Computer Science & Programming in Python. Aus unserem Katalog stehen dir über 1 Million Bücher zur Verfügung.

Information

Jahr
2019
ISBN
9781838553234

Markov Decision Processes and Dynamic Programming

In this chapter, we will continue our practical reinforcement learning journey with PyTorch by looking at Markov decision processes (MDPs) and dynamic programming. This chapter will start with the creation of a Markov chain and an MDP, which is the core of most reinforcement learning algorithms. You will also become more familiar with Bellman equations by practicing policy evaluation. We will then move on and apply two approaches to solving an MDP: value iteration and policy iteration. We will use the FrozenLake environment as an example. At the end of the chapter, we will demonstrate how to solve the interesting coin-flipping gamble problem with dynamic programming step by step.
The following recipes will be covered in this chapter:
  • Creating a Markov chain
  • Creating an MDP
  • Performing policy evaluation
  • Simulating the FrozenLake environment
  • Solving an MDP with a value iteration algorithm
  • Solving an MDP with a policy iteration algorithm
  • Solving the coin-flipping gamble problem

Technical requirements

You will need the following programs installed on your system to successfully execute the recipes in this chapter:
  • Python 3.6, 3.7, or above
  • Anaconda
  • PyTorch 1.0 or above
  • OpenAI Gym

Creating a Markov chain

Let's get started by creating a Markov chain, on which the MDP is developed.
A Markov chain describes a sequence of events that comply with the Markov property. It is defined by a set of possible states, S = {s0, s1, ... , sm}, and a transition matrix, T(s, s'), consisting of the probabilities of state s transitioning to state s'. With the Markov property, the future state of the process, given the present state, is conditionally independent of past states. In other words, the state of the process at t+1 is dependent only on the state at t. Here, we use a process of study and sleep as an example and create a Markov chain based on two states, s0 (study) and s1 (sleep). Let's say we have the following transition matrix:
In the next section, we will compute the transition matrix after k steps, and the probabilities of being in each state given an initial distribution of states, such as [0.7, 0.3], meaning there is a 70% chance that the process starts with study and a 30% chance that it starts with sleep.

How to do it...

To create a Markov chain for the study - and - sleep process and conduct some analysis on it, perform the following steps:
  1. Import the library and define the transition matrix:
>>> import torch
>>> T = torch.tensor([[0.4, 0.6],
... [0.8, 0.2]])
  1. Calculate the transition probability after k steps. Here, we use k = 2, 5, 10, 15, and 20 as examples:
>>> T_2 = torch.matrix_power(T, 2)
>>> T_5 = torch.matrix_power(T, 5)
>>> T_10 = torch.matrix_power(T, 10)
>>> T_15 = torch.matrix_power(T, 15)
>>> T_20 = torch.matrix_power(T, 20)
  1. Define the initial distribution of two states:
>>> v = torch.tensor([[0.7, 0.3]])
  1. Calculate the state distribution after k = 1, 2, 5, 10, 15, and 20 steps:
>>> v_1 = torch.mm(v, T)
>>> v_2 = torch.mm(v, T_2)
>>> v_5 = torch.mm(v, T_5)
>>> v_10 = torch.mm(v, T_10)
>>> v_15 = torch.mm(v, T_15)
>>> v_20 = torch.mm(v, T_20)

How it works...

In Step 2, we calculated the transition probability after k steps, which is the kth power of the transition matrix. You will see the following output:
>>> print("Transition probability after 2 steps:\n{}".format(T_2))
Transition probability after 2 steps:
tensor([[0.6400, 0.3600],
[0.4800, 0.5200]])
>>> print("Transition probability after 5 steps:\n{}".format(T_5))
Transition probability after 5 steps:
tensor([[0.5670, 0.4330],
[0.5773, 0.4227]])
>>> print(
"Transition probability after 10 steps:\n{}".format(T_10))
Transition probability after 10 steps:
tensor([[0.5715, 0.4285],
[0.5714, 0.4286]])
>>> print(
"Transition probability after 15 steps:\n{}".format(T_15))
Transition probability after 15 steps:
tensor([[0.5714, 0.4286],
[0.5714, 0.4286]])
>>> print(
"Transition probability after 20 steps:\n{}".format(T_20))
Transition probability after 20 steps:
tensor([[0.5714, 0.4286],
[0.5714, 0.4286]])
We can see that, after 10 to 15 steps, the transition probability converges. This means that, no matter what state the process is in, it has the same probability of transitioning to s0 (57.14%) and s1 (42.86%).
In Step 4, we calculated the state distribution after k = 1, 2, 5, 10, 15, and 20 steps, which is the multiplication of the initial state distribution and the transition probability. You can see the results here:
>>> print("Distribution of states after 1 step:\n{}".format(v_1))
Distribution of states after 1 step:
tensor([[0.5200, 0.4800]])
>>> print("Distribution of states after 2 steps:\n{}".format(v_2))
Distribution of states after 2 steps:
tensor([[0.5920, 0.4080]])
>>> print("Distribution of states after 5 steps:\n{}".format(v_5))
Distribution of states after 5 steps:
te...

Inhaltsverzeichnis

  1. Title Page
  2. Copyright and Credits
  3. About Packt
  4. Contributors
  5. Preface
  6. Getting Started with Reinforcement Learning and PyTorch
  7. Markov Decision Processes and Dynamic Programming
  8. Monte Carlo Methods for Making Numerical Estimations
  9. Temporal Difference and Q-Learning
  10. Solving Multi-armed Bandit Problems
  11. Scaling Up Learning with Function Approximation
  12. Deep Q-Networks in Action
  13. Implementing Policy Gradients and Policy Optimization
  14. Capstone Project – Playing Flappy Bird with DQN
  15. Other Books You May Enjoy
Zitierstile für PyTorch 1.x Reinforcement Learning Cookbook

APA 6 Citation

Liu, Y. (2019). PyTorch 1.x Reinforcement Learning Cookbook (1st ed.). Packt Publishing. Retrieved from https://www.perlego.com/book/1204181/pytorch-1x-reinforcement-learning-cookbook-over-60-recipes-to-design-develop-and-deploy-selflearning-ai-models-using-python-pdf (Original work published 2019)

Chicago Citation

Liu, Yuxi. (2019) 2019. PyTorch 1.x Reinforcement Learning Cookbook. 1st ed. Packt Publishing. https://www.perlego.com/book/1204181/pytorch-1x-reinforcement-learning-cookbook-over-60-recipes-to-design-develop-and-deploy-selflearning-ai-models-using-python-pdf.

Harvard Citation

Liu, Y. (2019) PyTorch 1.x Reinforcement Learning Cookbook. 1st edn. Packt Publishing. Available at: https://www.perlego.com/book/1204181/pytorch-1x-reinforcement-learning-cookbook-over-60-recipes-to-design-develop-and-deploy-selflearning-ai-models-using-python-pdf (Accessed: 14 October 2022).

MLA 7 Citation

Liu, Yuxi. PyTorch 1.x Reinforcement Learning Cookbook. 1st ed. Packt Publishing, 2019. Web. 14 Oct. 2022.