Hands-on Data Analysis and Visualization with Pandas
eBook - ePub

Hands-on Data Analysis and Visualization with Pandas

Engineer, Analyse and Visualize Data, Using Powerful Python Libraries

  1. English
  2. ePUB (mobile friendly)
  3. Available on iOS & Android
eBook - ePub

Hands-on Data Analysis and Visualization with Pandas

Engineer, Analyse and Visualize Data, Using Powerful Python Libraries

Book details
Book preview
Table of contents
Citations

About This Book

Get familiar with various Supervised, Unsupervised and Reinforcement learning algorithms Key Features

  • Understand the types of Machine learning.
  • Get familiar with different Feature extraction methods.
  • Get an overview of how Neural Network Algorithms work.
  • Learn how to implement Decision Trees and Random Forests.
  • The book not only explains the Classification algorithms but also discusses the deviations/ mathematical modeling.

  • Description
    This book covers important concepts and topics in Machine Learning. It begins with Data Cleansing and presents an overview of Feature Selection. It then talks about training and testing, cross-validation, and Feature Selection. The book covers algorithms and implementations of the most common Feature Selection Techniques. The book then focuses on Linear Regression and Gradient Descent. Some of the important Classification techniques such as K-nearest neighbors, logistic regression, NaĂŻve Bayesian, and Linear Discriminant Analysis are covered in the book. It then gives an overview of Neural Networks and explains the biological background, the limitations of the perceptron, and the backpropagation model. The Support Vector Machines and Kernel methods are also included in the book. It then shows how to implement Decision Trees and Random Forests. Towards the end, the book gives a brief overview of Unsupervised Learning. Various Feature Extraction techniques, such as Fourier Transform, STFT, and Local Binary patterns, are covered. The book also discusses Principle Component Analysis and its implementation. What will you learn
  • Learn how to prepare Data for Machine Learning.
  • Learn how to implement learning algorithms from scratch.
  • Use scikit-learn to implement algorithms.
  • Use various Feature Selection and Feature Extraction methods.
  • Learn how to develop a Face recognition system.

  • Who this book is for
    The book is designed for Undergraduate and Postgraduate Computer Science students and for the professionals who intend to switch to the fascinating world of Machine Learning. This book requires basic know-how of programming fundamentals, Python, in particular. Table of Contents
    1. An introduction to Machine Learning
    2. The beginning: Pre-Processing and Feature Selection
    3. Regression
    4. Classification
    5. Neural Networks- I
    6. Neural Networks-II
    7. Support Vector machines
    8. Decision Trees
    9. Clustering
    10. Feature Extraction
    Appendix
    A1. Cheat Sheets
    A2. Face Detection
    A3.Biblography About the Author
    Harsh Bhasin is an Applied Machine Learning researcher. Mr. Bhasin worked as Assistant Professor in Jamia Hamdard, New Delhi, and taught as a guest faculty in various institutes including Delhi Technological University. Before that, he worked in C# Client-Side Development and Algorithm Development.
    He has authored a few books including Programming in C#, Oxford University Press; Algorithms, Oxford University Press; Python Basics, Mercury; Python for Beginners, New Age International. Mr. Bhasin has authored a few papers published in renowned journals including Soft Computing, Springer, BMC Medical Informatics and Decision Making, AI and Society, etc. He is the reviewer of prominent journals and has been the editor of a few special issues. He has been a recipient of a distinguished fellowship.
    Outside work, he is deeply interested in Hindi Poetry, progressive era; Hindustani Classical Music, percussion instruments.
    His areas of interest include Data Structures, Algorithms Analysis and Design, Theory of Computation, Python, Machine Learning and Deep learning.

Frequently asked questions

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes, you can access Hands-on Data Analysis and Visualization with Pandas by Purna Chander Rao, in PDF and/or ePUB format, as well as other popular books in Computer Science & Data Processing. We have over one million books available in our catalogue for you to explore.

Information

Year
2020
ISBN
9789389845648

CHAPTER 1

Introduction to Data Analysis

Data analysis is an art. It is a science of extracting insights from the silos of data. This chapter introduces you to the data and its ecosystem components, along with the different stages of the data analysis process, how Python is useful for data analysis and different data science libraries/modules, and their installation process.

Structure

  • Inspiration for data analysis
  • What is data science?
  • Domain expertise
  • Maths and statistics
  • Artificial intelligence
  • Machine learning
  • Data infrastructure
  • Data analysis process
    • Business requirements
    • Data collection
    • Data cleansing
    • Data exploration and visualization
    • Data modeling
    • Model validation and testing
    • Deployment
  • Why Python for data analysis?
  • Python libraries for data analysis

Objective

This chapter will guide you through the different processes of data analysis, various concepts such as maths, statistics, and processes that make up this discipline. The concepts covered here will be a heads up for the coming chapters where these concepts and procedures will be applied in the form of Python code with different data related libraries.

Inspiration for data analysis

In this chapter, we will be covering various factors and trends that influence data analysis. In the current world of digitalization, a huge amount of data is produced by IoT devices like sensors, diagnosis reports from healthcare or wellness industry, social network portals such as Facebook, YouTube, LinkedIn, Instagram, and e-commerce sites like Alibaba, Amazon, or Flipkart, where you add an audio, video, comment, add a like, emoji, or you make bank transactions online or use an ATM kiosk to withdraw the money, buy something on e-commerce sites and much more.
This data is not exactly useful information. It is the result of processing, which takes into account a certain set of data that extracts some set of conclusions that can be used in different ways. This process of extracting information from the raw data is data analysis. This analysis of the data becomes the foundation for building predictive models or drawing data visualization charts around the data.
“Without Big data and analytics, companies are blind and deaf, wandering on to the web like deer on a freeway.”
-Geoffrey Moore, author, and consultant.

What is data science?

Data science is a study of data. It is multidisciplinary that involves maths, statistics, algorithms, domain expertise, processes, and systems to extract insights from data. This data might be structured, semi-structured, and unstructured. The following Figure 1.1 display different structures of data:
Figure 1.1

Structured data

  • Tabular rows and columns (Databases)
  • DWH (Tera data systems) and BI Systems
  • Text files such as comma-separated (.csv), tab-separated (.tsv).

Semi-structured data

  • Excel, XML, JSON, Logs.

Unstructured data

  • Audio, Video, Images.

Domain expertise

Domain expertise or domain knowledge is about expertise in a particular field like Healthcare, Insurance, Banking, and so on. A domain expert may or may not relate to technology but has in-depth knowledge of a particular industry, its trends, and practices that impact the industry. The process of data analysis not only requires having good expertise in tools and computational techniques but also needs to have a good understanding of the data. In short, the data analyst must be able to know how to search not only for data but also for information and how to treat that information to get valid insights from it.
For example, you are asked to build an application for e-commerce, banking, or insurance domain. The application has to be that it complements the industry and various dimensions of it. The technical team wouldn‘t know the industry norms or the application features; here is where domain expert and domain knowledge comes into the picture.

Maths and statistics

It is a study of statistics from a mathematical point of view. Data analysis requires a good amount of math. Good knowledge of statistics is also required because the statistical methods are applied to the analysis and interpretation of the data. Python provides a good amount of libraries to solve these mathematical and statistical problems, but one should have a good idea about how the libraries work.

Artificial intelligence

Artificial intelligence is the intelligence demonstrated by machines, in contrast to the natural intelligence displayed by humans. Artificial intelligence is the superset of data science, which is one of the advanced concepts in data analysis. It is the study of training computers for jobs which are done by humans. The term Artificial intelligence is two different words: Artificial means something which is not natural or human-made, and Intelligence means the ability to think or understand.
AI Market is already widespread, and you interact with it on a daily basis. Here are a few examples of Artificial intelligence:
  • Search engines like Google internally use gigantic algorithms to perform a better search.
  • Self-driving cars where the vehicles can completely navigate their way from one point to another.
  • Chatbots help as online messengers to assist customers immediately and effectively.
  • Voice searches on smartphones use AI to determine the best result for those long-tail keywords and conversational queries.
  • Online Ads use AI to target specific customers based on past behavior, interest, and search queries.

Machine learning

It is an Algorithmic driven study which makes computers capable of learning based on their own previous experience and improve the performance of the task. Machine learning is the subset of Artificial intelligence, and it is a study of machines where machines learn by themselves without being explicitly trained. Assuming you are asked to write a program for a speech recognition software converting speech to text, based on accent, grammar, pronunciation, vocabulary. It would be a gigantic task that can be easily understood by machine learning.
Technically machine learning is divided into three parts, explained as follows:

Supervised learning

In this learning, we ask machine questions and compare answers with the actual answers and instruct the machines to minimize the errors. Supervised machine learning can do things as follows:
  • Weather forecasting.
  • Detecting online frauds.
  • Market forecasting.
  • Image classification.

Unsupervised learning

In this learning, you give the machine huge chunks of data and instruct it to find some sort of patterns, and based on these patterns, your machine accomplishes certain tasks. Unsupervised machine learning can do things as follows:
  • Build recommendation engines
  • Targeted marketing
  • Customer segmentation

Reinforcement learning

In this learning, the machine is left in an environment where something is happening, and there is a reward if the machine does what we want, and there is a penalty if it performs incorrectly and based on it we instruct the machine to maximize the reward, and eventually, the machine learns the things which we want it to do. Reinforcement learning works on:
  • Games
  • Bidding and advertising
  • Training self-driven cars

Data infrastructure

Generally, people tend to refer to infrastructure as those things that support what they are doing at work. For example, the roads used for transportation, sewage system, and bridges, all these are considered as infrastructure. The role of data infrastructure is to protect, preserve, process, move, secure, and serve data as well as their applications for information service delivery. Data infrastructure includes software, hardware, and cloud or managed services, servers, storage, and so on.
Thanks to the Big data world, it generates a humongous amount of information that needs to be processed. Sometimes normal desktop systems or servers doesn‘t have enough computation power to read, process, or analyze them. We need systems with a high configuration of RAM or a good amount of disk space to save the data. The cloud-based Amazon (AWS)/GCP/Azure help us meet the challenges through resource allocation and vir...

Table of contents

  1. Cover Page
  2. Title Page
  3. Copyright Page
  4. Dedication Page
  5. About the Author
  6. About the Reviewer
  7. Acknowledgement
  8. Preface
  9. Errata
  10. Table of Contents
  11. 1. Introduction to Data Analysis
  12. 2. JupyterLab
  13. 3. Python Overview
  14. 4. Introduction to Numpy
  15. 5. Introduction to Pandas
  16. 6. Data Analysis
  17. 7. Time Series Analysis
  18. 8. Introduction to Statistics
  19. 9. Matplotlib
  20. 10. Seaborn
  21. 11. Exploratory Data Analysis