
Hands-On GPU Programming with Python and CUDA
Explore high-performance parallel computing with CUDA
- 310 pages
- English
- ePUB (mobile friendly)
- Available on iOS & Android
Hands-On GPU Programming with Python and CUDA
Explore high-performance parallel computing with CUDA
About this book
Build real-world applications with Python 2.7, CUDA 9, and CUDA 10. We suggest the use of Python 2.7 over Python 3.x, since Python 2.7 has stable support across all the libraries we use in this book.
Key Features
- Expand your background in GPU programming—PyCUDA, scikit-cuda, and Nsight
- Effectively use CUDA libraries such as cuBLAS, cuFFT, and cuSolver
- Apply GPU programming to modern data science applications
Book Description
Hands-On GPU Programming with Python and CUDA hits the ground running: you'll start by learning how to apply Amdahl's Law, use a code profiler to identify bottlenecks in your Python code, and set up an appropriate GPU programming environment. You'll then see how to "query" the GPU's features and copy arrays of data to and from the GPU's own memory.
As you make your way through the book, you'll launch code directly onto the GPU and write full blown GPU kernels and device functions in CUDA C. You'll get to grips with profiling GPU code effectively and fully test and debug your code using Nsight IDE. Next, you'll explore some of the more well-known NVIDIA libraries, such as cuFFT and cuBLAS.
With a solid background in place, you will now apply your new-found knowledge to develop your very own GPU-based deep neural network from scratch. You'll then explore advanced topics, such as warp shuffling, dynamic parallelism, and PTX assembly. In the final chapter, you'll see some topics and applications related to GPU programming that you may wish to pursue, including AI, graphics, and blockchain.
By the end of this book, you will be able to apply GPU programming to problems related to data science and high-performance computing.
What you will learn
- Launch GPU code directly from Python
- Write effective and efficient GPU kernels and device functions
- Use libraries such as cuFFT, cuBLAS, and cuSolver
- Debug and profile your code with Nsight and Visual Profiler
- Apply GPU programming to datascience problems
- Build a GPU-based deep neuralnetwork from scratch
- Explore advanced GPU hardware features, such as warp shuffling
Who this book is for
Hands-On GPU Programming with Python and CUDA is for developers and data scientists who want to learn the basics of effective GPU programming to improve performance using Python code. You should have an understanding of first-year college or university-level engineering mathematics and physics, and have some experience with Python as well as in any C-based programming language such as C, C++, Go, or Java.
Tools to learn more effectively

Saving Books

Keyword Search

Annotating Text

Listen to it instead
Information
Kernels, Threads, Blocks, and Grids
- Understanding the difference between a kernel and a device function
- How to compile and launch a kernel in PyCUDA and use a device function within a kernel
- Effectively using threads, blocks, and grids in the context of launching a kernel and how to use threadIdx and blockIdx within a kernel
- How and why to synchronize threads within a kernel, using both __syncthreads() for synchronizing all threads among a single block and the host to synchronize all threads among an entire grid of blocks
- How to use device global and shared memory for intra-thread communication
- How to use all of our newly acquired knowledge about kernels to properly implement a GPU version of the parallel prefix sum
Technical requirements
Kernels
The PyCUDA SourceModule function
import pycuda.autoinit
import pycuda.driver as drv
import numpy as np
from pycuda import gpuarray
from pycuda.compiler import SourceModule
ker = SourceModule("""
__global__ void scalar_multiply_kernel(float *outvec, float scalar, float *vec)
{
int i = threadIdx.x;
outvec[i] = scalar*vec[i];
}
""") scalar_multiply_gpu = ker.get_function("scalar_multiply_kernel") Table of contents
- Title Page
- Copyright and Credits
- Dedication
- About Packt
- Contributors
- Preface
- Why GPU Programming?
- Setting Up Your GPU Programming Environment
- Getting Started with PyCUDA
- Kernels, Threads, Blocks, and Grids
- Streams, Events, Contexts, and Concurrency
- Debugging and Profiling Your CUDA Code
- Using the CUDA Libraries with Scikit-CUDA
- The CUDA Device Function Libraries and Thrust
- Implementation of a Deep Neural Network
- Working with Compiled GPU Code
- Performance Optimization in CUDA
- Where to Go from Here
- Assessment
- Other Books You May Enjoy
Frequently asked questions
- Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
- Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app