eBook - ePub

PyTorch 1.x Reinforcement Learning Cookbook

Name: PyTorch 1.x Reinforcement Learning Cookbook
Author: Yuxi (Hayden) Liu

Over 60 recipes to design, develop, and deploy self-learning AI models using Python

Yuxi (Hayden) Liu

340 Seiten
English
ePUB (handyfreundlich)
Über iOS und Android verfügbar

eBook - ePub

PyTorch 1.x Reinforcement Learning Cookbook

Over 60 recipes to design, develop, and deploy self-learning AI models using Python

Yuxi (Hayden) Liu

Angaben zum Buch

Buchvorschau

Inhaltsverzeichnis

Quellenangaben

Über dieses Buch

Implement reinforcement learning techniques and algorithms with the help of real-world examples and recipes

Key Features

Use PyTorch 1.x to design and build self-learning artificial intelligence (AI) models
Implement RL algorithms to solve control and optimization challenges faced by data scientists today
Apply modern RL libraries to simulate a controlled environment for your projects

Book Description

Reinforcement learning (RL) is a branch of machine learning that has gained popularity in recent times. It allows you to train AI models that learn from their own actions and optimize their behavior. PyTorch has also emerged as the preferred tool for training RL models because of its efficiency and ease of use.

With this book, you'll explore the important RL concepts and the implementation of algorithms in PyTorch 1.x. The recipes in the book, along with real-world examples, will help you master various RL techniques, such as dynamic programming, Monte Carlo simulations, temporal difference, and Q-learning. You'll also gain insights into industry-specific applications of these techniques. Later chapters will guide you through solving problems such as the multi-armed bandit problem and the cartpole problem using the multi-armed bandit algorithm and function approximation. You'll also learn how to use Deep Q-Networks to complete Atari games, along with how to effectively implement policy gradients. Finally, you'll discover how RL techniques are applied to Blackjack, Gridworld environments, internet advertising, and the Flappy Bird game.

By the end of this book, you'll have developed the skills you need to implement popular RL algorithms and use RL techniques to solve real-world problems.

What you will learn

Use Q-learning and the state–action–reward–state–action (SARSA) algorithm to solve various Gridworld problems
Develop a multi-armed bandit algorithm to optimize display advertising
Scale up learning and control processes using Deep Q-Networks
Simulate Markov Decision Processes, OpenAI Gym environments, and other common control problems
Select and build RL models, evaluate their performance, and optimize and deploy them
Use policy gradient methods to solve continuous RL problems

Who this book is for

Machine learning engineers, data scientists and AI researchers looking for quick solutions to different reinforcement learning problems will find this book useful. Although prior knowledge of machine learning concepts is required, experience with PyTorch will be useful but not necessary.

Häufig gestellte Fragen

Wie kann ich mein Abo kündigen?

Gehe einfach zum Kontobereich in den Einstellungen und klicke auf „Abo kündigen“ – ganz einfach. Nachdem du gekündigt hast, bleibt deine Mitgliedschaft für den verbleibenden Abozeitraum, den du bereits bezahlt hast, aktiv. Mehr Informationen hier.

(Wie) Kann ich Bücher herunterladen?

Derzeit stehen all unsere auf Mobilgeräte reagierenden ePub-Bücher zum Download über die App zur Verfügung. Die meisten unserer PDFs stehen ebenfalls zum Download bereit; wir arbeiten daran, auch die übrigen PDFs zum Download anzubieten, bei denen dies aktuell noch nicht möglich ist. Weitere Informationen hier.

Welcher Unterschied besteht bei den Preisen zwischen den Aboplänen?

Mit beiden Aboplänen erhältst du vollen Zugang zur Bibliothek und allen Funktionen von Perlego. Die einzigen Unterschiede bestehen im Preis und dem Abozeitraum: Mit dem Jahresabo sparst du auf 12 Monate gerechnet im Vergleich zum Monatsabo rund 30 %.

Was ist Perlego?

Wir sind ein Online-Abodienst für Lehrbücher, bei dem du für weniger als den Preis eines einzelnen Buches pro Monat Zugang zu einer ganzen Online-Bibliothek erhältst. Mit über 1 Million Büchern zu über 1.000 verschiedenen Themen haben wir bestimmt alles, was du brauchst! Weitere Informationen hier.

Unterstützt Perlego Text-zu-Sprache?

Achte auf das Symbol zum Vorlesen in deinem nächsten Buch, um zu sehen, ob du es dir auch anhören kannst. Bei diesem Tool wird dir Text laut vorgelesen, wobei der Text beim Vorlesen auch grafisch hervorgehoben wird. Du kannst das Vorlesen jederzeit anhalten, beschleunigen und verlangsamen. Weitere Informationen hier.

Ist PyTorch 1.x Reinforcement Learning Cookbook als Online-PDF/ePub verfügbar?

Ja, du hast Zugang zu PyTorch 1.x Reinforcement Learning Cookbook von Yuxi (Hayden) Liu im PDF- und/oder ePub-Format sowie zu anderen beliebten Büchern aus Computer Science & Programming in Python. Aus unserem Katalog stehen dir über 1 Million Bücher zur Verfügung.

Information

Verlag

Packt Publishing

Jahr

2019

ISBN

9781838553234

Auflage

Thema

Computer Science

Thema

Programming in Python

Markov Decision Processes and Dynamic Programming

In this chapter, we will continue our practical reinforcement learning journey with PyTorch by looking at Markov decision processes (MDPs) and dynamic programming. This chapter will start with the creation of a Markov chain and an MDP, which is the core of most reinforcement learning algorithms. You will also become more familiar with Bellman equations by practicing policy evaluation. We will then move on and apply two approaches to solving an MDP: value iteration and policy iteration. We will use the FrozenLake environment as an example. At the end of the chapter, we will demonstrate how to solve the interesting coin-flipping gamble problem with dynamic programming step by step.

The following recipes will be covered in this chapter:

Creating a Markov chain
Creating an MDP
Performing policy evaluation
Simulating the FrozenLake environment
Solving an MDP with a value iteration algorithm
Solving an MDP with a policy iteration algorithm
Solving the coin-flipping gamble problem

Technical requirements

You will need the following programs installed on your system to successfully execute the recipes in this chapter:

Python 3.6, 3.7, or above
Anaconda
PyTorch 1.0 or above
OpenAI Gym

Creating a Markov chain

Let's get started by creating a Markov chain, on which the MDP is developed.

A Markov chain describes a sequence of events that comply with the Markov property. It is defined by a set of possible states, S = {s0, s1, ... , sm}, and a transition matrix, T(s, s'), consisting of the probabilities of state s transitioning to state s'. With the Markov property, the future state of the process, given the present state, is conditionally independent of past states. In other words, the state of the process at t+1 is dependent only on the state at t. Here, we use a process of study and sleep as an example and create a Markov chain based on two states, s0 (study) and s1 (sleep). Let's say we have the following transition matrix:

In the next section, we will compute the transition matrix after k steps, and the probabilities of being in each state given an initial distribution of states, such as [0.7, 0.3], meaning there is a 70% chance that the process starts with study and a 30% chance that it starts with sleep.

How to do it...

To create a Markov chain for the study - and - sleep process and conduct some analysis on it, perform the following steps:

Import the library and define the transition matrix:

>>> import torch
>>> T = torch.tensor([[0.4, 0.6],
... [0.8, 0.2]])

Calculate the transition probability after k steps. Here, we use k = 2, 5, 10, 15, and 20 as examples:

>>> T_2 = torch.matrix_power(T, 2)
>>> T_5 = torch.matrix_power(T, 5)
>>> T_10 = torch.matrix_power(T, 10)
>>> T_15 = torch.matrix_power(T, 15)
>>> T_20 = torch.matrix_power(T, 20)

Define the initial distribution of two states:

>>> v = torch.tensor([[0.7, 0.3]])

Calculate the state distribution after k = 1, 2, 5, 10, 15, and 20 steps:

>>> v_1 = torch.mm(v, T)
>>> v_2 = torch.mm(v, T_2)
>>> v_5 = torch.mm(v, T_5)
>>> v_10 = torch.mm(v, T_10)
>>> v_15 = torch.mm(v, T_15)
>>> v_20 = torch.mm(v, T_20)

How it works...

In Step 2, we calculated the transition probability after k steps, which is the k^th power of the transition matrix. You will see the following output:

>>> print("Transition probability after 2 steps:\n{}".format(T_2))
Transition probability after 2 steps:
tensor([[0.6400, 0.3600],
 [0.4800, 0.5200]])
>>> print("Transition probability after 5 steps:\n{}".format(T_5))
Transition probability after 5 steps:
tensor([[0.5670, 0.4330],
 [0.5773, 0.4227]])
>>> print(
"Transition probability after 10 steps:\n{}".format(T_10))
Transition probability after 10 steps:
tensor([[0.5715, 0.4285],
 [0.5714, 0.4286]])
>>> print(
"Transition probability after 15 steps:\n{}".format(T_15))
Transition probability after 15 steps:
tensor([[0.5714, 0.4286],
 [0.5714, 0.4286]])
>>> print(
"Transition probability after 20 steps:\n{}".format(T_20))
Transition probability after 20 steps:
tensor([[0.5714, 0.4286],
 [0.5714, 0.4286]])

We can see that, after 10 to 15 steps, the transition probability converges. This means that, no matter what state the process is in, it has the same probability of transitioning to s0 (57.14%) and s1 (42.86%).

In Step 4, we calculated the state distribution after k = 1, 2, 5, 10, 15, and 20 steps, which is the multiplication of the initial state distribution and the transition probability. You can see the results here:

>>> print("Distribution of states after 1 step:\n{}".format(v_1))
Distribution of states after 1 step:
tensor([[0.5200, 0.4800]])
>>> print("Distribution of states after 2 steps:\n{}".format(v_2))
Distribution of states after 2 steps:
tensor([[0.5920, 0.4080]])
>>> print("Distribution of states after 5 steps:\n{}".format(v_5))
Distribution of states after 5 steps:
te...

Inhaltsverzeichnis

Title Page
Copyright and Credits
About Packt
Contributors
Preface
Getting Started with Reinforcement Learning and PyTorch
Markov Decision Processes and Dynamic Programming
Monte Carlo Methods for Making Numerical Estimations
Temporal Difference and Q-Learning
Solving Multi-armed Bandit Problems
Scaling Up Learning with Function Approximation
Deep Q-Networks in Action
Implementing Policy Gradients and Policy Optimization
Capstone Project – Playing Flappy Bird with DQN
Other Books You May Enjoy

Zitierstile für PyTorch 1.x Reinforcement Learning Cookbook

APA 6 Citation

Liu, Y. (2019). PyTorch 1.x Reinforcement Learning Cookbook (1st ed.). Packt Publishing. Retrieved from https://www.perlego.com/book/1204181/pytorch-1x-reinforcement-learning-cookbook-over-60-recipes-to-design-develop-and-deploy-selflearning-ai-models-using-python-pdf (Original work published 2019)

Chicago Citation

Liu, Yuxi. (2019) 2019. PyTorch 1.x Reinforcement Learning Cookbook. 1st ed. Packt Publishing. https://www.perlego.com/book/1204181/pytorch-1x-reinforcement-learning-cookbook-over-60-recipes-to-design-develop-and-deploy-selflearning-ai-models-using-python-pdf.

Harvard Citation

Liu, Y. (2019) PyTorch 1.x Reinforcement Learning Cookbook. 1st edn. Packt Publishing. Available at: https://www.perlego.com/book/1204181/pytorch-1x-reinforcement-learning-cookbook-over-60-recipes-to-design-develop-and-deploy-selflearning-ai-models-using-python-pdf (Accessed: 14 October 2022).

MLA 7 Citation

Liu, Yuxi. PyTorch 1.x Reinforcement Learning Cookbook. 1st ed. Packt Publishing, 2019. Web. 14 Oct. 2022.