Hands-On Q-Learning with Python
eBook - ePub

Hands-On Q-Learning with Python

Practical Q-learning with OpenAI Gym, Keras, and TensorFlow

Nazia Habib

  1. 212 pagine
  2. English
  3. ePUB (disponibile sull'app)
  4. Disponibile su iOS e Android
eBook - ePub

Hands-On Q-Learning with Python

Practical Q-learning with OpenAI Gym, Keras, and TensorFlow

Nazia Habib

Dettagli del libro
Anteprima del libro
Indice dei contenuti
Citazioni

Informazioni sul libro

Leverage the power of reward-based training for your deep learning models with Python

Key Features

  • Understand Q-learning algorithms to train neural networks using Markov Decision Process (MDP)
  • Study practical deep reinforcement learning using Q-Networks
  • Explore state-based unsupervised learning for machine learning models

Book Description

Q-learning is a machine learning algorithm used to solve optimization problems in artificial intelligence (AI). It is one of the most popular fields of study among AI researchers.

This book starts off by introducing you to reinforcement learning and Q-learning, in addition to helping you get familiar with OpenAI Gym as well as libraries such as Keras and TensorFlow. A few chapters into the book, you will gain insights into modelfree Q-learning and use deep Q-networks and double deep Q-networks to solve complex problems. This book will guide you in exploring use cases such as self-driving vehicles and OpenAI Gym's CartPole problem. You will also learn how to tune and optimize Q-networks and their hyperparameters. As you progress, you will understand the reinforcement learning approach to solving real-world problems. You will also explore how to use Q-learning and related algorithms in real-world applications such as scientific research. Toward the end, you'll gain a sense of what's in store for reinforcement learning.

By the end of this book, you will be equipped with the skills you need to solve reinforcement learning problems using Q-learning algorithms with OpenAI Gym, Keras, and TensorFlow.

What you will learn

  • Explore the fundamentals of reinforcement learning and the state-action-reward process
  • Understand Markov decision processes
  • Get well versed with libraries such as Keras, and TensorFlow
  • Create and deploy model-free learning and deep Q-learning agents with TensorFlow, Keras, and OpenAI Gym
  • Choose and optimize a Q-Network's learning parameters and fine-tune its performance
  • Discover real-world applications and use cases of Q-learning

Who this book is for

If you are a machine learning developer, engineer, or professional who wants to delve into the deep learning approach for a complex environment, then this is the book for you. Proficiency in Python programming and basic understanding of decision-making in reinforcement learning is assumed.

Domande frequenti

Come faccio ad annullare l'abbonamento?
È semplicissimo: basta accedere alla sezione Account nelle Impostazioni e cliccare su "Annulla abbonamento". Dopo la cancellazione, l'abbonamento rimarrà attivo per il periodo rimanente già pagato. Per maggiori informazioni, clicca qui
È possibile scaricare libri? Se sì, come?
Al momento è possibile scaricare tramite l'app tutti i nostri libri ePub mobile-friendly. Anche la maggior parte dei nostri PDF è scaricabile e stiamo lavorando per rendere disponibile quanto prima il download di tutti gli altri file. Per maggiori informazioni, clicca qui
Che differenza c'è tra i piani?
Entrambi i piani ti danno accesso illimitato alla libreria e a tutte le funzionalità di Perlego. Le uniche differenze sono il prezzo e il periodo di abbonamento: con il piano annuale risparmierai circa il 30% rispetto a 12 rate con quello mensile.
Cos'è Perlego?
Perlego è un servizio di abbonamento a testi accademici, che ti permette di accedere a un'intera libreria online a un prezzo inferiore rispetto a quello che pagheresti per acquistare un singolo libro al mese. Con oltre 1 milione di testi suddivisi in più di 1.000 categorie, troverai sicuramente ciò che fa per te! Per maggiori informazioni, clicca qui.
Perlego supporta la sintesi vocale?
Cerca l'icona Sintesi vocale nel prossimo libro che leggerai per verificare se è possibile riprodurre l'audio. Questo strumento permette di leggere il testo a voce alta, evidenziandolo man mano che la lettura procede. Puoi aumentare o diminuire la velocità della sintesi vocale, oppure sospendere la riproduzione. Per maggiori informazioni, clicca qui.
Hands-On Q-Learning with Python è disponibile online in formato PDF/ePub?
Sì, puoi accedere a Hands-On Q-Learning with Python di Nazia Habib in formato PDF e/o ePub, così come ad altri libri molto apprezzati nelle sezioni relative a Computer Science e Computer Science General. Scopri oltre 1 milione di libri disponibili nel nostro catalogo.

Informazioni

Anno
2019
ISBN
9781789345759

Section 1: Q-Learning: A Roadmap

This section will introduces the reader to reinforcement learning and Q-learning, and the types of problem that can be solved with both. Readers will become familiar with OpenAI Gym as a tool for creating Q-learning projects and will build their first model-free Q-learning agent.
The following chapters are included in this section:
  • Chapter 1, Brushing Up on Reinforcement Learning Concepts
  • Chapter 2, Getting Started with the Q-Learning Algorithm
  • Chapter 3, Setting Up Your First Environment with OpenAI Gym
  • Chapter 4, Teaching a Smartcab to Drive Using Q-Learning

Brushing Up on Reinforcement Learning Concepts

In this book, you will learn the fundamentals of Q-learning, a branch of reinforcement learning (RL), and how to apply them to challenging real-world optimization problems. You'll design software that dynamically writes itself, modifies itself, and improves its own performance in real time.
In doing so, you will build self-learning intelligent agents that start with no knowledge of how to solve a problem and independently find optimal solutions to that problem through observation, trial and error, and memory.
RL is one of the most exciting branches of artificial intelligence (AI) and powers some of its most visible successes, from recommendation systems that learn from user behavior to game-playing machines that can beat any human being at chess or Go.
Q-learning is one of the easiest versions of RL to get started with, and mastering it will give you a solid foundation in your knowledge and practice of RL. Whether you work as a data scientist, machine learning engineer, or other practitioner in the data or AI space, you will find plenty of useful and practical resources to get you started.
We will cover the following topics in this introductory chapter:
  • Reviewing RL and the differences between reward-based learning and other types of machine learning
  • Learning what states are and what it means to take an action and receive a reward
  • Understanding how RL agents make decisions based on policies and future rewards
  • Discovering the two major types of model-free RL and diving deeper into Q-learning

What is RL?

An RL agent is an optimization process that learns from experience, using data from its environment that it has collected through its own observations. It starts out knowing nothing about a task explicitly, learns by trial and error about what happens when it makes decisions, keeps track of successful decisions, and makes those same decisions under the same circumstances in the future.
In fields other than AI, RL is also referred to as dynamic programming. It takes much of its basic operating structure from behavioral psychology, and many of its mathematical constructs such as utility functions are taken from fields such as economics and game theory.
Let's get familiar with some key concepts in RL:
  • Agent: This is the decision-making entity.
  • Environment: This is the world in which the agent operates, such as a game to win or task to accomplish.
  • State: This is where the agent is in its environment. When you define the states that an agent can be in, think about what it needs to know about its environment. For example, a self-driving car will need to know whether the next traffic light is red or green and whether there are pedestrians in the crosswalk; these are defined as state variables.
  • Action: This is the next move that the agent chooses to take.
  • Reward: This is the feedback that the agent gets from the environment for taking that action.
  • Policy: This is a function to map the agent's states to its actions. For your first RL agent, this will be as simple as a lookup table, called the Q-table. It will operate as your agent's brain.
  • Value: This is the future reward that an agent would receive by taking an action based on the future actions it could take. This is separate from the immediate reward it will get from taking that action (the value is also commonly called the utility).
The first type of RL agent that you will create is a model-free agent. A model-free RL agent does not know anything about a state that it has not seen, and so will not be able to estimate the value of the reward that it will receive from an unknown state. In other words, it cannot generalize about its environment. We will explore the differences between model-free learning and model-based learning in greater depth later in the book.
The two major model-free RL algorithms are called Q-learning and state-action-reward-state-action (SARSA). The algorithm that we will use throughout the book is Q-learning.
As we will see in the SARSA versus Q-learning – on-policy or off? section comparing the two algorithms, Q-learning can be treated as a variant of SARSA. We choose to use Q-learning as our introductory RL algorithm because it is relatively simple and straightforward to learn. As we build on and increase our RL skills, we can branch out into other algorithms that may be more complicated to learn, but they will give us better results.

States and actions

When first launched, your agent knows nothing about its environment and takes purely random actions.
As an example, suppose that a hypothetical self-driving car powered by a Q-learning algorithm notices that it's reached a red light, but it doesn't know that it's supposed to stop. It moves one block forward and receives a large penalty.
The car makes note of that penalty in the Q-table. The next time it encounters a red light, it looks at the Q-table when deciding what to do, and because the move-forward action in the state where it is stopped at a red light now has a lower reward value than any other action, it is less likely to decide to run the red light again.
Likewise, when it takes a correct action, such as stopping at a red light or safely moving closer to the destination, it gets a reward. Thus, it remembers that taking that action in that state led to a reward, and it becomes more likely to take that action again next time.
While a self-driving car in the real world will, of course, not be expected to teach itself what red lights mean, the driving problem is a popular learning simulation (and one that we'll be implementing in this book) because it's straightforward and easy to model as a state-action function (also called a finite state machine).The following is a sample finite state machine:
When we model a state-action function for any system, we decide the variables that we want to keep track of, and this lets us determine how many states the system can be in.
For example, a state variable for a vehicle might include information about what intersection the car is located at, whether the traffic light is red or green, and whether there are other cars around. Because we're keeping track of multiple variables, we might represent this as a vector.
The possible actions for a self-driving vehicle agent can be: move forward one block, turn left, turn right, and stop and wait – and these actions are mapped to the appropriate values of the state variable.
Recall that an agent's state-action function is called its policy. A policy can be either simple and straightforward or complex and difficult to enumerate, depending on the problem itself and the number of states and actions.
In the model-free version of Q-learning, it's important to note that we do not learn an agent's policy explicitly. We only update the output values that we see as a result of that policy, which we are mapping to the state-action inputs. This is why we refer to model-free Q-learning as a value-based algorithm as opposed to a policy-based algorithm.

The decision-making process

A learning agent's high-level algorithm looks like the following:
  1. Take note of what state you're in.
  2. Take an action based on your policy and receive a reward.
  3. Take note of the reward you received by taking that action in that state.
We can express this mathematically using a Markov decision process (MDP). We'll discuss MDPs in more detail throughout the book. For now, ...

Indice dei contenuti

  1. Title Page
  2. Copyright and Credits
  3. About Packt
  4. Contributors
  5. Preface
  6. Section 1: Q-Learning: A Roadmap
  7. Brushing Up on Reinforcement Learning Concepts
  8. Getting Started with the Q-Learning Algorithm
  9. Setting Up Your First Environment with OpenAI Gym
  10. Teaching a Smartcab to Drive Using Q-Learning
  11. Section 2: Building and Optimizing Q-Learning Agents
  12. Building Q-Networks with TensorFlow
  13. Digging Deeper into Deep Q-Networks with Keras and TensorFlow
  14. Section 3: Advanced Q-Learning Challenges with Keras, TensorFlow, and OpenAI Gym
  15. Decoupling Exploration and Exploitation in Multi-Armed Bandits
  16. Further Q-Learning Research and Future Projects
  17. Assessments
  18. Other Books You May Enjoy
Stili delle citazioni per Hands-On Q-Learning with Python

APA 6 Citation

Habib, N. (2019). Hands-On Q-Learning with Python (1st ed.). Packt Publishing. Retrieved from https://www.perlego.com/book/960445/handson-qlearning-with-python-practical-qlearning-with-openai-gym-keras-and-tensorflow-pdf (Original work published 2019)

Chicago Citation

Habib, Nazia. (2019) 2019. Hands-On Q-Learning with Python. 1st ed. Packt Publishing. https://www.perlego.com/book/960445/handson-qlearning-with-python-practical-qlearning-with-openai-gym-keras-and-tensorflow-pdf.

Harvard Citation

Habib, N. (2019) Hands-On Q-Learning with Python. 1st edn. Packt Publishing. Available at: https://www.perlego.com/book/960445/handson-qlearning-with-python-practical-qlearning-with-openai-gym-keras-and-tensorflow-pdf (Accessed: 14 October 2022).

MLA 7 Citation

Habib, Nazia. Hands-On Q-Learning with Python. 1st ed. Packt Publishing, 2019. Web. 14 Oct. 2022.