PyTorch 1.x Reinforcement Learning Cookbook
eBook - ePub

PyTorch 1.x Reinforcement Learning Cookbook

Over 60 recipes to design, develop, and deploy self-learning AI models using Python

Yuxi (Hayden) Liu

  1. 340 páginas
  2. English
  3. ePUB (apto para móviles)
  4. Disponible en iOS y Android
eBook - ePub

PyTorch 1.x Reinforcement Learning Cookbook

Over 60 recipes to design, develop, and deploy self-learning AI models using Python

Yuxi (Hayden) Liu

Detalles del libro
Vista previa del libro
Índice
Citas

Información del libro

Implement reinforcement learning techniques and algorithms with the help of real-world examples and recipes

Key Features

  • Use PyTorch 1.x to design and build self-learning artificial intelligence (AI) models
  • Implement RL algorithms to solve control and optimization challenges faced by data scientists today
  • Apply modern RL libraries to simulate a controlled environment for your projects

Book Description

Reinforcement learning (RL) is a branch of machine learning that has gained popularity in recent times. It allows you to train AI models that learn from their own actions and optimize their behavior. PyTorch has also emerged as the preferred tool for training RL models because of its efficiency and ease of use.

With this book, you'll explore the important RL concepts and the implementation of algorithms in PyTorch 1.x. The recipes in the book, along with real-world examples, will help you master various RL techniques, such as dynamic programming, Monte Carlo simulations, temporal difference, and Q-learning. You'll also gain insights into industry-specific applications of these techniques. Later chapters will guide you through solving problems such as the multi-armed bandit problem and the cartpole problem using the multi-armed bandit algorithm and function approximation. You'll also learn how to use Deep Q-Networks to complete Atari games, along with how to effectively implement policy gradients. Finally, you'll discover how RL techniques are applied to Blackjack, Gridworld environments, internet advertising, and the Flappy Bird game.

By the end of this book, you'll have developed the skills you need to implement popular RL algorithms and use RL techniques to solve real-world problems.

What you will learn

  • Use Q-learning and the state–action–reward–state–action (SARSA) algorithm to solve various Gridworld problems
  • Develop a multi-armed bandit algorithm to optimize display advertising
  • Scale up learning and control processes using Deep Q-Networks
  • Simulate Markov Decision Processes, OpenAI Gym environments, and other common control problems
  • Select and build RL models, evaluate their performance, and optimize and deploy them
  • Use policy gradient methods to solve continuous RL problems

Who this book is for

Machine learning engineers, data scientists and AI researchers looking for quick solutions to different reinforcement learning problems will find this book useful. Although prior knowledge of machine learning concepts is required, experience with PyTorch will be useful but not necessary.

Preguntas frecuentes

¿Cómo cancelo mi suscripción?
Simplemente, dirígete a la sección ajustes de la cuenta y haz clic en «Cancelar suscripción». Así de sencillo. Después de cancelar tu suscripción, esta permanecerá activa el tiempo restante que hayas pagado. Obtén más información aquí.
¿Cómo descargo los libros?
Por el momento, todos nuestros libros ePub adaptables a dispositivos móviles se pueden descargar a través de la aplicación. La mayor parte de nuestros PDF también se puede descargar y ya estamos trabajando para que el resto también sea descargable. Obtén más información aquí.
¿En qué se diferencian los planes de precios?
Ambos planes te permiten acceder por completo a la biblioteca y a todas las funciones de Perlego. Las únicas diferencias son el precio y el período de suscripción: con el plan anual ahorrarás en torno a un 30 % en comparación con 12 meses de un plan mensual.
¿Qué es Perlego?
Somos un servicio de suscripción de libros de texto en línea que te permite acceder a toda una biblioteca en línea por menos de lo que cuesta un libro al mes. Con más de un millón de libros sobre más de 1000 categorías, ¡tenemos todo lo que necesitas! Obtén más información aquí.
¿Perlego ofrece la función de texto a voz?
Busca el símbolo de lectura en voz alta en tu próximo libro para ver si puedes escucharlo. La herramienta de lectura en voz alta lee el texto en voz alta por ti, resaltando el texto a medida que se lee. Puedes pausarla, acelerarla y ralentizarla. Obtén más información aquí.
¿Es PyTorch 1.x Reinforcement Learning Cookbook un PDF/ePUB en línea?
Sí, puedes acceder a PyTorch 1.x Reinforcement Learning Cookbook de Yuxi (Hayden) Liu en formato PDF o ePUB, así como a otros libros populares de Computer Science y Programming in Python. Tenemos más de un millón de libros disponibles en nuestro catálogo para que explores.

Información

Año
2019
ISBN
9781838553234
Edición
1

Markov Decision Processes and Dynamic Programming

In this chapter, we will continue our practical reinforcement learning journey with PyTorch by looking at Markov decision processes (MDPs) and dynamic programming. This chapter will start with the creation of a Markov chain and an MDP, which is the core of most reinforcement learning algorithms. You will also become more familiar with Bellman equations by practicing policy evaluation. We will then move on and apply two approaches to solving an MDP: value iteration and policy iteration. We will use the FrozenLake environment as an example. At the end of the chapter, we will demonstrate how to solve the interesting coin-flipping gamble problem with dynamic programming step by step.
The following recipes will be covered in this chapter:
  • Creating a Markov chain
  • Creating an MDP
  • Performing policy evaluation
  • Simulating the FrozenLake environment
  • Solving an MDP with a value iteration algorithm
  • Solving an MDP with a policy iteration algorithm
  • Solving the coin-flipping gamble problem

Technical requirements

You will need the following programs installed on your system to successfully execute the recipes in this chapter:
  • Python 3.6, 3.7, or above
  • Anaconda
  • PyTorch 1.0 or above
  • OpenAI Gym

Creating a Markov chain

Let's get started by creating a Markov chain, on which the MDP is developed.
A Markov chain describes a sequence of events that comply with the Markov property. It is defined by a set of possible states, S = {s0, s1, ... , sm}, and a transition matrix, T(s, s'), consisting of the probabilities of state s transitioning to state s'. With the Markov property, the future state of the process, given the present state, is conditionally independent of past states. In other words, the state of the process at t+1 is dependent only on the state at t. Here, we use a process of study and sleep as an example and create a Markov chain based on two states, s0 (study) and s1 (sleep). Let's say we have the following transition matrix:
In the next section, we will compute the transition matrix after k steps, and the probabilities of being in each state given an initial distribution of states, such as [0.7, 0.3], meaning there is a 70% chance that the process starts with study and a 30% chance that it starts with sleep.

How to do it...

To create a Markov chain for the study - and - sleep process and conduct some analysis on it, perform the following steps:
  1. Import the library and define the transition matrix:
>>> import torch
>>> T = torch.tensor([[0.4, 0.6],
... [0.8, 0.2]])
  1. Calculate the transition probability after k steps. Here, we use k = 2, 5, 10, 15, and 20 as examples:
>>> T_2 = torch.matrix_power(T, 2)
>>> T_5 = torch.matrix_power(T, 5)
>>> T_10 = torch.matrix_power(T, 10)
>>> T_15 = torch.matrix_power(T, 15)
>>> T_20 = torch.matrix_power(T, 20)
  1. Define the initial distribution of two states:
>>> v = torch.tensor([[0.7, 0.3]])
  1. Calculate the state distribution after k = 1, 2, 5, 10, 15, and 20 steps:
>>> v_1 = torch.mm(v, T)
>>> v_2 = torch.mm(v, T_2)
>>> v_5 = torch.mm(v, T_5)
>>> v_10 = torch.mm(v, T_10)
>>> v_15 = torch.mm(v, T_15)
>>> v_20 = torch.mm(v, T_20)

How it works...

In Step 2, we calculated the transition probability after k steps, which is the kth power of the transition matrix. You will see the following output:
>>> print("Transition probability after 2 steps:\n{}".format(T_2))
Transition probability after 2 steps:
tensor([[0.6400, 0.3600],
[0.4800, 0.5200]])
>>> print("Transition probability after 5 steps:\n{}".format(T_5))
Transition probability after 5 steps:
tensor([[0.5670, 0.4330],
[0.5773, 0.4227]])
>>> print(
"Transition probability after 10 steps:\n{}".format(T_10))
Transition probability after 10 steps:
tensor([[0.5715, 0.4285],
[0.5714, 0.4286]])
>>> print(
"Transition probability after 15 steps:\n{}".format(T_15))
Transition probability after 15 steps:
tensor([[0.5714, 0.4286],
[0.5714, 0.4286]])
>>> print(
"Transition probability after 20 steps:\n{}".format(T_20))
Transition probability after 20 steps:
tensor([[0.5714, 0.4286],
[0.5714, 0.4286]])
We can see that, after 10 to 15 steps, the transition probability converges. This means that, no matter what state the process is in, it has the same probability of transitioning to s0 (57.14%) and s1 (42.86%).
In Step 4, we calculated the state distribution after k = 1, 2, 5, 10, 15, and 20 steps, which is the multiplication of the initial state distribution and the transition probability. You can see the results here:
>>> print("Distribution of states after 1 step:\n{}".format(v_1))
Distribution of states after 1 step:
tensor([[0.5200, 0.4800]])
>>> print("Distribution of states after 2 steps:\n{}".format(v_2))
Distribution of states after 2 steps:
tensor([[0.5920, 0.4080]])
>>> print("Distribution of states after 5 steps:\n{}".format(v_5))
Distribution of states after 5 steps:
te...

Índice

  1. Title Page
  2. Copyright and Credits
  3. About Packt
  4. Contributors
  5. Preface
  6. Getting Started with Reinforcement Learning and PyTorch
  7. Markov Decision Processes and Dynamic Programming
  8. Monte Carlo Methods for Making Numerical Estimations
  9. Temporal Difference and Q-Learning
  10. Solving Multi-armed Bandit Problems
  11. Scaling Up Learning with Function Approximation
  12. Deep Q-Networks in Action
  13. Implementing Policy Gradients and Policy Optimization
  14. Capstone Project – Playing Flappy Bird with DQN
  15. Other Books You May Enjoy
Estilos de citas para PyTorch 1.x Reinforcement Learning Cookbook

APA 6 Citation

Liu, Y. (2019). PyTorch 1.x Reinforcement Learning Cookbook (1st ed.). Packt Publishing. Retrieved from https://www.perlego.com/book/1204181/pytorch-1x-reinforcement-learning-cookbook-over-60-recipes-to-design-develop-and-deploy-selflearning-ai-models-using-python-pdf (Original work published 2019)

Chicago Citation

Liu, Yuxi. (2019) 2019. PyTorch 1.x Reinforcement Learning Cookbook. 1st ed. Packt Publishing. https://www.perlego.com/book/1204181/pytorch-1x-reinforcement-learning-cookbook-over-60-recipes-to-design-develop-and-deploy-selflearning-ai-models-using-python-pdf.

Harvard Citation

Liu, Y. (2019) PyTorch 1.x Reinforcement Learning Cookbook. 1st edn. Packt Publishing. Available at: https://www.perlego.com/book/1204181/pytorch-1x-reinforcement-learning-cookbook-over-60-recipes-to-design-develop-and-deploy-selflearning-ai-models-using-python-pdf (Accessed: 14 October 2022).

MLA 7 Citation

Liu, Yuxi. PyTorch 1.x Reinforcement Learning Cookbook. 1st ed. Packt Publishing, 2019. Web. 14 Oct. 2022.