eBook - ePub

Hands-On Data Analysis with Pandas

Name: Hands-On Data Analysis with Pandas
Author: Stefanie Molin

A Python data science handbook for data collection, wrangling, analysis, and visualization, 2nd Edition

Stefanie Molin

Buch teilen

788 Seiten
English
ePUB (handyfreundlich)
Über iOS und Android verfügbar

eBook - ePub

Hands-On Data Analysis with Pandas

A Python data science handbook for data collection, wrangling, analysis, and visualization, 2nd Edition

Stefanie Molin

Angaben zum Buch

Buchvorschau

Inhaltsverzeichnis

Quellenangaben

Über dieses Buch

Get to grips with pandas by working with real datasets and master data discovery, data manipulation, data preparation, and handling data for analytical tasks

Key Features

Perform efficient data analysis and manipulation tasks using pandas 1.x
Apply pandas to different real-world domains with the help of step-by-step examples
Make the most of pandas as an effective data exploration tool

Book Description

Extracting valuable business insights is no longer a 'nice-to-have', but an essential skill for anyone who handles data in their enterprise. Hands-On Data Analysis with Pandas is here to help beginners and those who are migrating their skills into data science get up to speed in no time.This book will show you how to analyze your data, get started with machine learning, and work effectively with the Python libraries often used for data science, such as pandas, NumPy, matplotlib, seaborn, and scikit-learn.Using real-world datasets, you will learn how to use the pandas library to perform data wrangling to reshape, clean, and aggregate your data. Then, you will learn how to conduct exploratory data analysis by calculating summary statistics and visualizing the data to find patterns. In the concluding chapters, you will explore some applications of anomaly detection, regression, clustering, and classification using scikit-learn to make predictions based on past data.This updated edition will equip you with the skills you need to use pandas 1.x to efficiently perform various data manipulation tasks, reliably reproduce analyses, and visualize your data for effective decision making – valuable knowledge that can be applied across multiple domains.

What you will learn

Understand how data analysts and scientists gather and analyze data
Perform data analysis and data wrangling using Python
Combine, group, and aggregate data from multiple sources
Create data visualizations with pandas, matplotlib, and seaborn
Apply machine learning algorithms to identify patterns and make predictions
Use Python data science libraries to analyze real-world datasets
Solve common data representation and analysis problems using pandas
Build Python scripts, modules, and packages for reusable analysis code

Who this book is for

This book is for data science beginners, data analysts, and Python developers who want to explore each stage of data analysis and scientific computing using a wide range of datasets. Data scientists looking to implement pandas in their machine learning workflow will also find plenty of valuable know-how as they progress.You'll find it easier to follow along with this book if you have a working knowledge of the Python programming language, but a Python crash-course tutorial is provided in the code bundle for anyone who needs a refresher.

]]>

Häufig gestellte Fragen

Wie kann ich mein Abo kündigen?

Gehe einfach zum Kontobereich in den Einstellungen und klicke auf „Abo kündigen“ – ganz einfach. Nachdem du gekündigt hast, bleibt deine Mitgliedschaft für den verbleibenden Abozeitraum, den du bereits bezahlt hast, aktiv. Mehr Informationen hier.

(Wie) Kann ich Bücher herunterladen?

Derzeit stehen all unsere auf Mobilgeräte reagierenden ePub-Bücher zum Download über die App zur Verfügung. Die meisten unserer PDFs stehen ebenfalls zum Download bereit; wir arbeiten daran, auch die übrigen PDFs zum Download anzubieten, bei denen dies aktuell noch nicht möglich ist. Weitere Informationen hier.

Welcher Unterschied besteht bei den Preisen zwischen den Aboplänen?

Mit beiden Aboplänen erhältst du vollen Zugang zur Bibliothek und allen Funktionen von Perlego. Die einzigen Unterschiede bestehen im Preis und dem Abozeitraum: Mit dem Jahresabo sparst du auf 12 Monate gerechnet im Vergleich zum Monatsabo rund 30 %.

Was ist Perlego?

Wir sind ein Online-Abodienst für Lehrbücher, bei dem du für weniger als den Preis eines einzelnen Buches pro Monat Zugang zu einer ganzen Online-Bibliothek erhältst. Mit über 1 Million Büchern zu über 1.000 verschiedenen Themen haben wir bestimmt alles, was du brauchst! Weitere Informationen hier.

Unterstützt Perlego Text-zu-Sprache?

Achte auf das Symbol zum Vorlesen in deinem nächsten Buch, um zu sehen, ob du es dir auch anhören kannst. Bei diesem Tool wird dir Text laut vorgelesen, wobei der Text beim Vorlesen auch grafisch hervorgehoben wird. Du kannst das Vorlesen jederzeit anhalten, beschleunigen und verlangsamen. Weitere Informationen hier.

Ist Hands-On Data Analysis with Pandas als Online-PDF/ePub verfügbar?

Ja, du hast Zugang zu Hands-On Data Analysis with Pandas von Stefanie Molin im PDF- und/oder ePub-Format sowie zu anderen beliebten Büchern aus Computer Science & Data Processing. Aus unserem Katalog stehen dir über 1 Million Bücher zur Verfügung.

Information

Verlag

Packt Publishing

Jahr

2021

ISBN

9781800565913

Thema

Computer Science

Thema

Data Processing

Section 1: Getting Started with Pandas

Our journey begins with an introduction to data analysis and statistics, which will lay a strong foundation for the concepts we will cover throughout the book. Then, we will set up our Python data science environment, which contains everything we will need to work through the examples, and get started with learning the basics of pandas.

This section comprises the following chapters:

Chapter 1, Introduction to Data Analysis
Chapter 2, Working with Pandas DataFrames

Chapter 1: Introduction to Data Analysis

Before we can begin our hands-on introduction to data analysis with pandas, we need to learn about the fundamentals of data analysis. Those who have ever looked at the documentation for a software library know how overwhelming it can be if you have no clue what you are looking for. Therefore, it is essential that we master not only the coding aspect but also the thought process and workflow required to analyze data, which will prove the most useful in augmenting our skill set in the future.

Much like the scientific method, data science has some common workflows that we can follow when we want to conduct an analysis and present the results. The backbone of this process is statistics, which gives us ways to describe our data, make predictions, and also draw conclusions about it. Since prior knowledge of statistics is not a prerequisite, this chapter will give us exposure to the statistical concepts we will use throughout this book, as well as areas for further exploration.

After covering the fundamentals, we will get our Python environment set up for the remainder of this book. Python is a powerful language, and its uses go way beyond data science: building web applications, software, and web scraping, to name a few. In order to work effectively across projects, we need to learn how to make virtual environments, which will isolate each project's dependencies. Finally, we will learn how to work with Jupyter Notebooks in order to follow along with the text.

The following topics will be covered in this chapter:

The fundamentals of data analysis
Statistical foundations
Setting up a virtual environment

Chapter materials

All the files for this book are on GitHub at https://github.com/stefmolin/Hands-On-Data-Analysis-with-Pandas-2nd-edition. While having a GitHub account isn't necessary to work through this book, it is a good idea to create one, as it will serve as a portfolio for any data/coding projects. In addition, working with Git will provide a version control system and make collaboration easy.

Tip

Check out this article to learn some Git basics: https://www.freecodecamp.org/news/learn-the-basics-of-git-in-under-10-minutes-da548267cc91/.

In order to get a local copy of the files, we have a few options (ordered from least useful to most useful):

Download the ZIP file and extract the files locally.
Clone the repository without forking it.
Fork the repository and then clone it.

This book includes exercises for every chapter; therefore, for those who want to keep a copy of their solutions along with the original content on GitHub, it is highly recommended to fork the repository and clone the forked version. When we fork a repository, GitHub will make a repository under our own profile with the latest version of the original. Then, whenever we make changes to our version, we can push the changes back up. Note that if we simply clone, we don't get this benefit.

The relevant buttons for initiating this process are circled in the following screenshot:

Figure 1.1 – Getting a local copy of the code for following along

Important note

The cloning process will copy the files to the current working directory in a folder called Hands-On-Data-Analysis-with-Pandas-2nd-edition. To make a folder to put this repository in, we can use mkdir my_folder && cd my_folder. This will create a new folder (directory) called my_folder and then change the current directory to that folder, after which we can clone the repository. We can chain these two commands (and any number of commands) together by adding && in between them. This can be thought of as and then (provided the first command succeeds).

This repository has folders for each chapter. This chapter's materials can be found at https://github.com/stefmolin/Hands-On-Data-Analysis-with-Pandas-2nd-edition/tree/master/ch_01. While the bulk of this chapter doesn't involve any coding, feel free to follow along in the introduction_to_data_analysis.ipynb notebook on the GitHub website until we set up our environment toward the end of the chapter. After we do so, we will use the check_your_environment.ipynb notebook to get familiar with Jupyter Notebooks and to run some checks to make sure that everything is set up properly for the rest of this book.

Since the code that's used to generate the content in these notebooks is not the main focus of this chapter, the majority of it has been separated into the visual_aids package, which is used to create visuals for explaining concepts throughout the book, and the check_environment.py file. If you choose to inspect these files, don't be overwhelmed; everything that's relevant to data science will be covered in this bo...