eBook - ePub

PyTorch 1.x Reinforcement Learning Cookbook

Name: PyTorch 1.x Reinforcement Learning Cookbook
Author: Yuxi (Hayden) Liu

Over 60 recipes to design, develop, and deploy self-learning AI models using Python

Yuxi (Hayden) Liu

340 pagine
English
ePUB (disponibile sull'app)
Disponibile su iOS e Android

eBook - ePub

PyTorch 1.x Reinforcement Learning Cookbook

Over 60 recipes to design, develop, and deploy self-learning AI models using Python

Yuxi (Hayden) Liu

Dettagli del libro

Anteprima del libro

Indice dei contenuti

Citazioni

Informazioni sul libro

Implement reinforcement learning techniques and algorithms with the help of real-world examples and recipes

Key Features

Use PyTorch 1.x to design and build self-learning artificial intelligence (AI) models
Implement RL algorithms to solve control and optimization challenges faced by data scientists today
Apply modern RL libraries to simulate a controlled environment for your projects

Book Description

Reinforcement learning (RL) is a branch of machine learning that has gained popularity in recent times. It allows you to train AI models that learn from their own actions and optimize their behavior. PyTorch has also emerged as the preferred tool for training RL models because of its efficiency and ease of use.

With this book, you'll explore the important RL concepts and the implementation of algorithms in PyTorch 1.x. The recipes in the book, along with real-world examples, will help you master various RL techniques, such as dynamic programming, Monte Carlo simulations, temporal difference, and Q-learning. You'll also gain insights into industry-specific applications of these techniques. Later chapters will guide you through solving problems such as the multi-armed bandit problem and the cartpole problem using the multi-armed bandit algorithm and function approximation. You'll also learn how to use Deep Q-Networks to complete Atari games, along with how to effectively implement policy gradients. Finally, you'll discover how RL techniques are applied to Blackjack, Gridworld environments, internet advertising, and the Flappy Bird game.

By the end of this book, you'll have developed the skills you need to implement popular RL algorithms and use RL techniques to solve real-world problems.

What you will learn

Use Q-learning and the state–action–reward–state–action (SARSA) algorithm to solve various Gridworld problems
Develop a multi-armed bandit algorithm to optimize display advertising
Scale up learning and control processes using Deep Q-Networks
Simulate Markov Decision Processes, OpenAI Gym environments, and other common control problems
Select and build RL models, evaluate their performance, and optimize and deploy them
Use policy gradient methods to solve continuous RL problems

Who this book is for

Machine learning engineers, data scientists and AI researchers looking for quick solutions to different reinforcement learning problems will find this book useful. Although prior knowledge of machine learning concepts is required, experience with PyTorch will be useful but not necessary.

Domande frequenti

Come faccio ad annullare l'abbonamento?

È semplicissimo: basta accedere alla sezione Account nelle Impostazioni e cliccare su "Annulla abbonamento". Dopo la cancellazione, l'abbonamento rimarrà attivo per il periodo rimanente già pagato. Per maggiori informazioni, clicca qui

È possibile scaricare libri? Se sì, come?

Al momento è possibile scaricare tramite l'app tutti i nostri libri ePub mobile-friendly. Anche la maggior parte dei nostri PDF è scaricabile e stiamo lavorando per rendere disponibile quanto prima il download di tutti gli altri file. Per maggiori informazioni, clicca qui

Che differenza c'è tra i piani?

Entrambi i piani ti danno accesso illimitato alla libreria e a tutte le funzionalità di Perlego. Le uniche differenze sono il prezzo e il periodo di abbonamento: con il piano annuale risparmierai circa il 30% rispetto a 12 rate con quello mensile.

Cos'è Perlego?

Perlego è un servizio di abbonamento a testi accademici, che ti permette di accedere a un'intera libreria online a un prezzo inferiore rispetto a quello che pagheresti per acquistare un singolo libro al mese. Con oltre 1 milione di testi suddivisi in più di 1.000 categorie, troverai sicuramente ciò che fa per te! Per maggiori informazioni, clicca qui.

Perlego supporta la sintesi vocale?

Cerca l'icona Sintesi vocale nel prossimo libro che leggerai per verificare se è possibile riprodurre l'audio. Questo strumento permette di leggere il testo a voce alta, evidenziandolo man mano che la lettura procede. Puoi aumentare o diminuire la velocità della sintesi vocale, oppure sospendere la riproduzione. Per maggiori informazioni, clicca qui.

PyTorch 1.x Reinforcement Learning Cookbook è disponibile online in formato PDF/ePub?

Sì, puoi accedere a PyTorch 1.x Reinforcement Learning Cookbook di Yuxi (Hayden) Liu in formato PDF e/o ePub, così come ad altri libri molto apprezzati nelle sezioni relative a Computer Science e Programming in Python. Scopri oltre 1 milione di libri disponibili nel nostro catalogo.

Informazioni

Editore

Packt Publishing

Anno

2019

ISBN

9781838553234

Edizione

Argomento

Computer Science

Categoria

Programming in Python

Markov Decision Processes and Dynamic Programming

In this chapter, we will continue our practical reinforcement learning journey with PyTorch by looking at Markov decision processes (MDPs) and dynamic programming. This chapter will start with the creation of a Markov chain and an MDP, which is the core of most reinforcement learning algorithms. You will also become more familiar with Bellman equations by practicing policy evaluation. We will then move on and apply two approaches to solving an MDP: value iteration and policy iteration. We will use the FrozenLake environment as an example. At the end of the chapter, we will demonstrate how to solve the interesting coin-flipping gamble problem with dynamic programming step by step.

The following recipes will be covered in this chapter:

Creating a Markov chain
Creating an MDP
Performing policy evaluation
Simulating the FrozenLake environment
Solving an MDP with a value iteration algorithm
Solving an MDP with a policy iteration algorithm
Solving the coin-flipping gamble problem

Technical requirements

You will need the following programs installed on your system to successfully execute the recipes in this chapter:

Python 3.6, 3.7, or above
Anaconda
PyTorch 1.0 or above
OpenAI Gym

Creating a Markov chain

Let's get started by creating a Markov chain, on which the MDP is developed.

A Markov chain describes a sequence of events that comply with the Markov property. It is defined by a set of possible states, S = {s0, s1, ... , sm}, and a transition matrix, T(s, s'), consisting of the probabilities of state s transitioning to state s'. With the Markov property, the future state of the process, given the present state, is conditionally independent of past states. In other words, the state of the process at t+1 is dependent only on the state at t. Here, we use a process of study and sleep as an example and create a Markov chain based on two states, s0 (study) and s1 (sleep). Let's say we have the following transition matrix:

In the next section, we will compute the transition matrix after k steps, and the probabilities of being in each state given an initial distribution of states, such as [0.7, 0.3], meaning there is a 70% chance that the process starts with study and a 30% chance that it starts with sleep.

How to do it...

To create a Markov chain for the study - and - sleep process and conduct some analysis on it, perform the following steps:

Import the library and define the transition matrix:

>>> import torch
>>> T = torch.tensor([[0.4, 0.6],
... [0.8, 0.2]])

Calculate the transition probability after k steps. Here, we use k = 2, 5, 10, 15, and 20 as examples:

>>> T_2 = torch.matrix_power(T, 2)
>>> T_5 = torch.matrix_power(T, 5)
>>> T_10 = torch.matrix_power(T, 10)
>>> T_15 = torch.matrix_power(T, 15)
>>> T_20 = torch.matrix_power(T, 20)

Define the initial distribution of two states:

>>> v = torch.tensor([[0.7, 0.3]])

Calculate the state distribution after k = 1, 2, 5, 10, 15, and 20 steps:

>>> v_1 = torch.mm(v, T)
>>> v_2 = torch.mm(v, T_2)
>>> v_5 = torch.mm(v, T_5)
>>> v_10 = torch.mm(v, T_10)
>>> v_15 = torch.mm(v, T_15)
>>> v_20 = torch.mm(v, T_20)

How it works...

In Step 2, we calculated the transition probability after k steps, which is the k^th power of the transition matrix. You will see the following output:

>>> print("Transition probability after 2 steps:\n{}".format(T_2))
Transition probability after 2 steps:
tensor([[0.6400, 0.3600],
 [0.4800, 0.5200]])
>>> print("Transition probability after 5 steps:\n{}".format(T_5))
Transition probability after 5 steps:
tensor([[0.5670, 0.4330],
 [0.5773, 0.4227]])
>>> print(
"Transition probability after 10 steps:\n{}".format(T_10))
Transition probability after 10 steps:
tensor([[0.5715, 0.4285],
 [0.5714, 0.4286]])
>>> print(
"Transition probability after 15 steps:\n{}".format(T_15))
Transition probability after 15 steps:
tensor([[0.5714, 0.4286],
 [0.5714, 0.4286]])
>>> print(
"Transition probability after 20 steps:\n{}".format(T_20))
Transition probability after 20 steps:
tensor([[0.5714, 0.4286],
 [0.5714, 0.4286]])

We can see that, after 10 to 15 steps, the transition probability converges. This means that, no matter what state the process is in, it has the same probability of transitioning to s0 (57.14%) and s1 (42.86%).

In Step 4, we calculated the state distribution after k = 1, 2, 5, 10, 15, and 20 steps, which is the multiplication of the initial state distribution and the transition probability. You can see the results here:

>>> print("Distribution of states after 1 step:\n{}".format(v_1))
Distribution of states after 1 step:
tensor([[0.5200, 0.4800]])
>>> print("Distribution of states after 2 steps:\n{}".format(v_2))
Distribution of states after 2 steps:
tensor([[0.5920, 0.4080]])
>>> print("Distribution of states after 5 steps:\n{}".format(v_5))
Distribution of states after 5 steps:
te...

Indice dei contenuti

Title Page
Copyright and Credits
About Packt
Contributors
Preface
Getting Started with Reinforcement Learning and PyTorch
Markov Decision Processes and Dynamic Programming
Monte Carlo Methods for Making Numerical Estimations
Temporal Difference and Q-Learning
Solving Multi-armed Bandit Problems
Scaling Up Learning with Function Approximation
Deep Q-Networks in Action
Implementing Policy Gradients and Policy Optimization
Capstone Project – Playing Flappy Bird with DQN
Other Books You May Enjoy

Stili delle citazioni per PyTorch 1.x Reinforcement Learning Cookbook

APA 6 Citation

Liu, Y. (2019). PyTorch 1.x Reinforcement Learning Cookbook (1st ed.). Packt Publishing. Retrieved from https://www.perlego.com/book/1204181/pytorch-1x-reinforcement-learning-cookbook-over-60-recipes-to-design-develop-and-deploy-selflearning-ai-models-using-python-pdf (Original work published 2019)

Chicago Citation

Liu, Yuxi. (2019) 2019. PyTorch 1.x Reinforcement Learning Cookbook. 1st ed. Packt Publishing. https://www.perlego.com/book/1204181/pytorch-1x-reinforcement-learning-cookbook-over-60-recipes-to-design-develop-and-deploy-selflearning-ai-models-using-python-pdf.

Harvard Citation

Liu, Y. (2019) PyTorch 1.x Reinforcement Learning Cookbook. 1st edn. Packt Publishing. Available at: https://www.perlego.com/book/1204181/pytorch-1x-reinforcement-learning-cookbook-over-60-recipes-to-design-develop-and-deploy-selflearning-ai-models-using-python-pdf (Accessed: 14 October 2022).

MLA 7 Citation

Liu, Yuxi. PyTorch 1.x Reinforcement Learning Cookbook. 1st ed. Packt Publishing, 2019. Web. 14 Oct. 2022.