- English
- ePUB (mobile friendly)
- Available on iOS & Android
The Data Science Handbook
About This Book
A comprehensive overview of data science covering the analytics, programming, and business skills necessary to master the discipline
Finding a good data scientist has been likened to hunting for a unicorn: the required combination of technical skills is simply very hard to find in one person. In addition, good data science is not just rote application of trainable skill sets; it requires the ability to think flexibly about all these areas and understand the connections between them. This book provides a crash course in data science, combining all the necessary skills into a unified discipline.
Unlike many analytics books, computer science and software engineering are given extensive coverage since they play such a central role in the daily work of a data scientist. The author also describes classic machine learning algorithms, from their mathematical foundations to real-world applications. Visualization tools are reviewed, and their central importance in data science is highlighted. Classical statistics is addressed to help readers think critically about the interpretation of data and its common pitfalls. The clear communication of technical results, which is perhaps the most undertrained of data science skills, is given its own chapter, and all topics are explained in the context of solving real-world data problems. The book also features:
ā¢ Extensive sample code and tutorials using Pythonā¢ along with its technical libraries
ā¢ Core technologies of "Big Data, " including their strengths and limitations and how they can be used to solve real-world problems
ā¢ Coverage of the practical realities of the tools, keeping theory to a minimum; however, when theory is presented, it is done in an intuitive way to encourage critical thinking and creativity
ā¢ A wide variety of case studies from industry
ā¢ Practical advice on the realities of being a data scientist today, including the overall workflow, where time is spent, the types of datasets worked on, and the skill sets needed
The Data Science Handbook is an ideal resource for data analysis methodology and big data software tools. The book is appropriate for people who want to practice data science, but lack the required skill sets. This includes software professionals who need to better understand analytics and statisticians who need to understand software. Modern data science is a unified discipline, and it is presented as such. This book is also an appropriate reference for researchers and entry-level graduate students who need to learn real-world analytics and expand their skill set.
FIELD CADY is the data scientist at the Allen Institute for Artificial Intelligence, where he develops tools that use machine learning to mine scientific literature. He has also worked at Google and several Big Data startups. He has a BS in physics and mathfrom Stanford University, and an MS in computer science from Carnegie Mellon.
Frequently asked questions
Information
Chapter 1
Introduction: Becoming a Unicorn
Data science means doing analytics work that, for one reason or another, requires a substantial amount of software engineering skills.
1.1 Aren't Data Scientists Just Overpaid Statisticians?
1.2 How Is This Book Organized?
1.3 How to Use This Book?
- 1. You can read it cover-to-cover. If you do that, it should give you a self-contained course in data science that will leave you ready to tackle real problems. If you have a strong background in computer programming, or in mathematics, then some of it will be review.
- 2. You can use it to come quickly up to speed on a specific subject. I have tried to make the different chapters pretty self-contained, especially the chapters after the first section.
- 3. The book contains a lot of sample codes, in pieces that are large enough to use as a starting point for your own projects.
1.4 Why Is It All in Pythonā¢, Anyway?
- 1. Python is the most popular language for data scientists. R is its only major competitor, at least when it comes to free tools. I have used both extensively, and I think that Python is flat-out better (except for some obscure statistics packages that have been written in R and that are rarely needed anyway).
- 2. I like to say that for any task, Python is the second-best language. It's a jack-of-all-trades. If you only need to worry about statistics, or numerical computation, or web parsing, then there are better options out there. But if you need to do all of these things within a single project, then Python is your best option. Since data science is so inherently multidisciplinary, this makes it a perfect fit.
1.5 Example Code and Datasets
- 1. As a data scientist, you need to be able to read longish pieces of code. This is a nonoptional skill, and if you aren't used to it, then this will give you a chance to practice.
- 2. I wanted to make it easier for you to poach the code from this book, if you feel so inclined.
Table of contents
- Cover
- Title Page
- Copyright
- Dedication
- Table of Contents
- Preface
- Chapter 1: Introduction: Becoming a Unicorn
- Part I: The Stuff You'll Always Use
- Part II: Stuff You Still Need to Know
- Part III: Specialized or Advanced Topics
- Index
- End User License Agreement