eBook - ePub

Practitioner's Guide to Data Science

Name: Practitioner's Guide to Data Science
Author: Nasir Ali Mirza

Streamlining Data Science Solutions using Python, Scikit-Learn, and Azure ML Service Platform

Nasir Ali Mirza,

English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Practitioner's Guide to Data Science

Streamlining Data Science Solutions using Python, Scikit-Learn, and Azure ML Service Platform

Nasir Ali Mirza,

Book details

Book preview

Table of contents

Citations

About This Book

Covers Data Science concepts, processes, and the real-world hands-on use cases.

Key Features
? Covers the journey from a basic programmer to an effective Data Science developer.
? Applied use of Data Science native processes like CRISP-DM and Microsoft TDSP.
? Implementation of MLOps using Microsoft Azure DevOps.

Description
"How is the Data Science project to be implemented?" has never been more conceptually sounding, thanks to the work presented in this book. This book provides an in-depth look at the current state of the world's data and how Data Science plays a pivotal role in everything we do.This book explains and implements the entire Data Science lifecycle using well-known data science processes like CRISP-DM and Microsoft TDSP. The book explains the significance of these processes in connection with the high failure rate of Data Science projects.The book helps build a solid foundation in Data Science concepts and related frameworks. It teaches how to implement real-world use cases using data from the HMDA dataset. It explains Azure ML Service architecture, its capabilities, and implementation to the DS team, who will then be prepared to implement MLOps. The book also explains how to use Azure DevOps to make the process repeatable while we're at it.By the end of this book, you will learn strong Python coding skills, gain a firm grasp of concepts such as feature engineering, create insightful visualizations and become acquainted with techniques for building machine learning models.

What you will learn
? Organize Data Science projects using CRISP-DM and Microsoft TDSP.
? Learn to acquire and explore data using Python visualizations.
? Get well versed with the implementation of data pre-processing and Feature Engineering.
? Understand algorithm selection, model development, and model evaluation.
? Hands-on with Azure ML Service, its architecture, and capabilities.
? Learn to use Azure ML SDK and MLOps for implementing real-world use cases.

Who this book is for
This book is intended for programmers who wish to pursue AI/ML development and build a solid conceptual foundation and familiarity with related processes and frameworks. Additionally, this book is an excellent resource for Software Architects and Managers involved in the design and delivery of Data Science-based solutions.

Table of Contents
1. Data Science for Business
2. Data Science Project Methodologies and Team Processes
3. Business Understanding and Its Data Landscape
4. Acquire, Explore, and Analyze Data
5. Pre-processing and Preparing Data
6. Developing a Machine Learning Model
7. Lap Around Azure ML Service
8. Deploying and Managing Models

Frequently asked questions

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Yes, you can access Practitioner's Guide to Data Science by Nasir Ali Mirza in PDF and/or ePUB format, as well as other popular books in Computer Science & Data Processing. We have over one million books available in our catalogue for you to explore.

Information

Publisher

BPB Publications

Year

2022

ISBN

9789391392871

Topic

Computer Science

Subtopic

Data Processing

Index

Computer Science

CHAPTER 1 Data Science for Business

Data Science is an emerging requirement for successful businesses. Data strategy is becoming a critical factor for business success and growth. Dependency on data usage for businesses has moved way beyond traditional departmental or enterprise reporting. Competitive usage of enterprise data has entered the realm of building and utilizing Data Science solutions to drive operational efficiency, deliver context-based personalization of services and offerings, and have Machine Learning (ML)/Artificial Intelligence (AI) model-assisted decision support systems and automations. This chapter provides an overview of Data Science and the scope of its application in business. It also touches upon its responsible use and implementation in terms of ethical and legal principles.

Structure

In this chapter, we will discuss the following topics:

Application programmer to Data Science professional
What is Data Science?
Unprecedented scope of Data Science
Data Science application
Big Data, Data Mining (DM), Machine Learning (ML), Deep Learning (DL), Artificial Intelligence (AI), and Data Science
Legal, ethical, and security aspects of Data Science
Methodology used in organizing this book

Objectives

At the end of this chapter, you should be able to:

Describe and differentiate ML, DL, AI, Big Data, and Data Science.
Understand the scope and application of Data Science.
Explain the responsible use of Data Science.

Application programmer to Data Science professional

As an application programmer, we deal with data for things like recording data, adding validations, performing aggregations, efficiently storing and retrieving data, developing transactional reports, performing data fixes, making sure data types are right and optimal.

Data Science is another level of working with data than regular application development. It is about finding insights from the data that are not otherwise evident. We perform a statistical study of data and explore the relations among different data elements present in the dataset. As an example, in regular working with data, we will capture, store, and report on sales data, whereas in Data Science the focus is on how sales data are influenced by related internal and external factors, for example, how are sales affected by price, season, geography, customer demography, promotional schemes, and competitors offers.

While developing a Data Science solution, we start with studying the business goals, then define hypotheses and theories, determine data collection needs, and how to obtain required data (Data Engineering). We look at the range of values in the dataset, their mean, mode, median, standard deviation, data distribution, multicollinearity, and variance. We will conduct visual and statistical exploration of data to find out correlations among the data elements. These correlations help identify causes and effects using statistical procedures and measurements. Once influential data elements (features) in the dataset are identified, the solution will utilize these features with appropriate algorithms for training the model(s) for building insights and predicting the future outcomes. Data Science solution development also involves refining and enriching raw data into more valuable features that will make model training more efficient in determining data relations and their weightage. As a part of solution development, we need to deal with data problems like mixed scaling, skewness, class imbalances.

Data Science is the next level of working with data, performing its statistical analysis and building predictive models.

The purpose of the Data Science programming is to determine patterns, correlations, predictor weightages in the existing data such that relevant models can be built for predicting future outcomes. There are well-developed methods to evaluate the performance of these models in terms of their correctness and error margins when used with the newly available data.

Table 1.1 shows how scope, purpose, and evaluation paths are different when working with data as an application programmer and as Data Science professional:

	Application programmer	Data Science professional
Scope	CRUD operations.	Statistical analysis.
Main purpose	Enable business transactions.	Make predictions and extract insights.
Evaluation	Efficient storage and retrieval.	Correctness and error margin in predictions and outcomes.

Table 1.1: Different aspects of application programming and Data Science solutioning

What is Data Science?

Data Science is the application of a scientific methodology to the study of data for the purpose of extracting insights and making predictions with trained models.

Its full scope comprises of defining theories and hypothesis, data collection, its statistical analysis, raw data enrichment, model building, its optimization, and evaluation. During the study of data, scientific methodology is applied to accept or reject the hypothesis based on the statistical calculations and measurements, things like significance tests, confidence intervals, measurement of correctness, and errors in making predictions.

Data scientists possess a strong passion for devising interesting questions and obtaining raw data to answer those questions. They have personalities full of curiosity, inquisitiveness, and imagination. Their skills include conducting statistical analysis of data, its numerical interpretation, utilization of programming languages like Python and R, effective verbal and written communication skills, ability to comprehend business domain to the extent of Data Science work and research and investigative mindset.

Data Science is the application of a scientific methodology to the study of data for the purpose of extracting insights and making predictions with trained models.

As a Data Science professional, you would understand the business goals for conducting the data science work and review the business data landscape. You will devise the plan for collecting business data, prepare and clean this data, and perform its detailed exploration and analysis. Once data validity and reliability are established, work on building the models, their testing, optimization, and evaluation. Finally, deploy the model for use with the newly available data.

The unprecedented scope of Data Science

Data Science as an area of study and application has been there for a long time albeit not with the same name. Since it is based on the concepts and principles of statistics and mathematics, it did exist in academic, and business fields for a long time. What has given birth to the unprecedented scope of Data Science application in the last couple of decades owes to the fact of tremendous advancement in Computer Science and related technologies.

Data Science applies to every field and situation where data either already exist or can be obtained, essentially every industry, and subject.

Earlier there were limitations in capturing, storing, and processing data. In all these aspects, there was manual human labor involved with very limited automation. During the last two to three decades (1990 onwards), there has been exponential advancement in automatic data capturing, storage, and processing technologies. For example, a number of IoT-connected devices in 2015 were 15.4 billion that increased to 30.7 billion by 2020 and is estimated to increase to 150 billion by 2025. A number of internet users by 1995 were 16 million (0.4% of the world population) and by 2020 it was 4,833 million, that is, 62% of the world population. The volume of data created in the year 2010 alone was 2 Zettabytes (ZB) (2 trillion GBs approximately). World data in 2018 was 33 ZB and it is estimated to grow to 175 ZB by 2025.

On one hand, there has been a huge progress in storage technologies, and on another their cost has been reduced significantly to make it affordable at an individual and small business levels as well. Storage cost in 1985 was $ 100,000 per GB, in 2000 it reduced to $7.5 per GB, in 2015 $ 0.038 per GB and by 2020 it was $ 0.01 per GB.

In the field of automatic data capturing, the huge amount of data gets recorded 24×7 about our personal and public life. In 2010, there were 298 data interactions daily per person, and by the year 2025, this is estimated to increase to 4909 interactions daily per person. Right from gadgets and appliances that we use are IoT enabled meaning they are generating and transmitting data, things like refrigerators, washing machines, cookers, security cameras, elevators, watches we wear, cell phones we use. Our communication over social media, net surfing we do, our likes, dislikes, preferences all of this gets recorded automatically. Cell phones and applications running on our cell phones collect a variety of data round the clock about us. Alone Google Maps platform is consumed by more than a billion people and 5 million active apps and websites regularly, this generates very large location data 24×7 that is of significant interest for multiple business use cases. Smartwatches and phones capture our health and fitness data. Our spending behaviors, consumption and credit data, travel data are all getting recorded automatically. In the public life, the vehicles we use, the routes we take, public surveillance systems generate, and capture a large volume of audio-visual data. The knowledge produced in the past is getting digitized with the help of tools and technologies like image to text, searchable video contents. Telemetry data from manufacturing, transportation, utility services, and so on continue to add a large volume of real-time data that gets captured, transmitted, and saved automatically.

The global datasphere will grow from 33 ZB in 2018 to 175 ZB by 2025. Nearly 30% of the world’s data will need real-time processing. (Data Age 2025)

This level of automatic data capturing is equally matched with enormous growth in low-cost storage, and compute power. These three factors − automatic data capturing, low-cost storage, and compute power − together with advancement in other areas of Computer Science have facilitated the application of Data Science factually everywhere. The recent advent and faster adoption of cloud computing has removed the physical barriers further, and taken this automatic data capturing, storing, and processing to new heights.

Data Science application

Data Science applies to every field and situation where data either already exist or can be obtained, essentially every industry and subject. This is because the Data Science deals with the study of data for the purpose of extracting meaningful insights, finding answers to interesting questions, and being able to make predictions.

The advancement in automatic data capturing, low-cost storage, a...

Cover Page
Title Page
Copyright Page
Foreword
Dedication Page
About the Author
About the Reviewer
Acknowledgement
Preface
Errata
Table of Contents
1. Data Science for Business
2. Data Science Project Methodologies and Team Processes
3. Business Understanding and Its Data Landscape
4. Acquire, Explore, and Analyze Data
5. Pre-processing and Preparing Data
6. Developing a Machine Learning Model
7. Lap Around Azure ML Service
8. Deploying and Managing Models
Index