Distributed Computing with Python
Table of Contents
Distributed Computing with Python
Credits
About the Author
About the Reviewer
www.PacktPub.com
eBooks, discount offers, and more
Why subscribe?
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. An Introduction to Parallel and Distributed Computing
Parallel computing
Distributed computing
Shared memory versus distributed memory
Amdahl's law
The mixed paradigm
Summary
2. Asynchronous Programming
Coroutines
An asynchronous example
Summary
3. Parallelism in Python
Multiple threads
Multiple processes
Multiprocess queues
Closing thoughts
Summary
4. Distributed Applications – with Celery
Establishing a multimachine environment
Installing Celery
Testing the installation
A tour of Celery
More complex Celery applications
Celery in production
Celery alternatives – Python-RQ
Celery alternatives – Pyro
Summary
5. Python in the Cloud
Cloud computing and AWS
Creating an AWS account
Creating an EC2 instance
Storing data in Amazon S3
Amazon elastic beanstalk
Creating a private cloud
Summary
6. Python on an HPC Cluster
Your typical HPC cluster
Job schedulers
Running a Python job using HTCondor
Running a Python job using PBS
Debugging
Summary
7. Testing and Debugging Distributed Applications
The big picture
Common problems – clocks and time
Common problems – software environments
Common problems – permissions and environments
Common problems – the availability of hardware resources
Challenges – the development environment
A useful strategy – logging everything
A useful strategy – simulating components
Summary
8. The Road Ahead
The first two chapters
The tools
The cloud and the HPC world
Debugging and monitoring
Where to go next
Index
Distributed Computing with Python
Copyright © 2016 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: April 2016
Production reference: 1060416
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78588-969-1
www.packtpub.com
Author
Francesco Pierfederici
Reviewer
James King
Commissioning Editor
Veena Pagare
Acquisition Editor
Aaron Lazar
Content Development Editor
Parshva Sheth
Technical Editor
Abhishek R. Kotian
Copy Editor
Neha Vyas
Project Coordinator
Nikhil Nair
Proofreader
Safis Editing
Indexer
Rekha Nair
Graphics
Disha Haria
Production Coordinator
Melwyn Dsa
Cover Work
Melwyn Dsa
Francesco Pierfederici is a software engineer who loves Python. He has been working in the fields of astronomy, biology, and numerical weather forecasting for the last 20 years.
He has built large distributed systems that make use of tens of thousands of cores at a time and run on some of the fastest supercomputers in the world. He has also written a lot of applications of dubious usefulness but that are great fun. Mostly, he just likes to build things.
James King is a software developer with a broad range of experience in distributed systems. He is a contributor to many open source projects including OpenStack and Mozilla Firefox. He enjoys mathematics, horsing around with his kids, games, and art.
eBooks, discount offers, and more
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at
www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at
<[email protected]>
for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
https://www2.packtpub.com/books/subscription/packtlib
Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.
- Fully searchable across every book published by Packt
- Copy and paste, print, and bookmark content
- On demand and accessible via a web browser
Parallel and distributed computing is a fascinating subject that only a few years ago developers in only a very few large companies and national labs were privy to. Things have changed dramatically in the last decade or so, and now everybody can build small- and medium-scale distributed applications in a variety of programming languages including, of course, our favorite one: Python.
This book is a very practical guide for Python programmers who are starting to build their own distributed systems. It starts off by illustrating the bare minimum theoretical concepts needed to understand parallel and distributed computing in order to lay the basic foundations required for the rest of the (more practical) chapters.
It then looks at some first examples of parallelism using nothing more than modules from the Python standard library. The next step is to move beyond the confines of a single computer and start using more and more nodes. This is accomplished using a number of third-party libraries, including Celery and Pyro.
The remaining chapters investigate a few deployment options for our distributed applications. The cloud and classic High Performance Computing (HPC) clusters, together with their strengths and challenges, take center stage.
Finally, the thorny issues of monitoring, logging, profiling, and debugging are touched upon.
All in all, this is very much a hands-on bo...