Data Science for Mathematicians
eBook - ePub

Data Science for Mathematicians

  1. 516 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Data Science for Mathematicians

Book details
Book preview
Table of contents
Citations

About This Book

Mathematicians have skills that, if deepened in the right ways, would enable them to use data to answer questions important to them and others, and report those answers in compelling ways. Data science combines parts of mathematics, statistics, computer science. Gaining such power and the ability to teach has reinvigorated the careers of mathematicians. This handbook will assist mathematicians to better understand the opportunities presented by data science. As it applies to the curriculum, research, and career opportunities, data science is a fast-growing field. Contributors from both academics and industry present their views on these opportunities and how to advantage them.

Frequently asked questions

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes, you can access Data Science for Mathematicians by Nathan Carter, Nathan Carter in PDF and/or ePUB format, as well as other popular books in Mathematics & Mathematics General. We have over one million books available in our catalogue for you to explore.

Information

Year
2020
ISBN
9780429675676
Edition
1
Chapter 1
Introduction
Nathan Carter
Bentley University
1.1Who should read this book?
1.2What is data science?
1.3Is data science new?
1.4What can I expect from this book?
1.5What will this book expect from me?
This chapter serves as an introduction to both this text and the field of data science in general. Its first few sections explain the book’s purpose and context. Then Sections 1.4 and 1.5 explain which subjects will be covered and suggest how you should interact with them.
1.1Who should read this book?
The job market continues to demand data scientists in fields as diverse as health care and music, marketing and defense, sports and academia. Practitioners in these fields have seen the value of evidence-based decision making and communication. Their demand for employees with those skills obligates the academy to train students for data science careers. Such an obligation is not unwelcome because data science has in common with academia a quest for answers.
Yet data science degree programs are quite new; very few faculty in the academy have a PhD in data science specifically. Thus the next generation of data scientists will be trained by faculty in closely related disciplines, primarily statistics, computer science, and mathematics.
Many faculty in those fields are teaching data-science-related courses now. Some do so because they like the material. Others want to try something new. Some want to use the related skills for consulting. Others just want to help their institution as it launches a new program or expands to meet increased demand. Three of my mathematician friends have had their recent careers shaped by a transition from pure mathematics to data science. Their stories serve as examples of this transition.
My friend David earned his PhD in category theory and was doing part-time teaching and part-time consulting using his computer skills. He landed a full-time teaching job at an institution that was soon to introduce graduate courses in data science. His consulting background and computing skills made him a natural choice for teaching some of those courses, which eventually led to curriculum development, a new job title, and grant writing. David is one of the authors of Chapter 8.
Another friend, Sam, completed a PhD in probability and began a postdoctoral position in that field. When his institution needed a new director of its data science masters program, his combination of mathematical background and programming skills made him a great internal candidate. Now in that role, his teaching, expository writing, and career as a whole are largely focused on data science. Sam is the author of Chapter 9.
The third and final friend I’ll mention here, Mahesh, began his career as a number theorist and his research required him to pick up some programming expertise. Wanting to learn a bit more about computing, he saw data science as an exciting space in which to do so. Before long he was serving on a national committee about data science curricula and spending a sabbatical in a visiting position where he could make connections to data science academics and practitioners. Mahesh is the other author of Chapter 8.
These are just the three people closest to me who have made this transition. As you read this, stories of your own friends or colleagues may come to mind. Even if you don't know a mathematician-turned-data-scientist personally, most mathematicians are familiar with Cathy O’Neil from her famous book Weapons of Math Destruction [377], who left algebraic geometry to work in various applied positions, and has authored several books on data science.
In each of these stories, a pure mathematician with some computer experience made a significant change to their career by learning and doing data science, a transition that’s so feasible because a mathematical background is excellent preparation for it. Eric Place1 summarized the state of data science by saying, “There aren't any experts; it’s just who’s the fastest learner.”
But mathematicians who want to follow a path like that of David, Sam, or Mahesh have had no straightforward way to get started. Those three friends cobbled together their own data science educations from books, websites, software tutorials, and self-imposed project work. This book is here so you don't have to do that, but can learn from their experiences and those of others. With a mathematical background and some computing experience, this book can to be your pathway to teaching in a data science program and considering research in the field.
But the book does not exist solely for the benefit of its mathematician readers. Students of data science, as they learn its techniques and best practices, inevitably ask why those techniques work and how they became best practices. Mathematics is one of the disciplines best suited to answering that type of question, in data science or any other quantitative context. We are in the habit of demanding the highest standards of evidence and are not content to know just that a technique works or is widely accepted. Bringing that mindset to data science will give students those important “why” answers and make your teaching of data science more robust. If this book helps you shift or expand your career, it will not be for your benefit only, but for that of our students as well.
1.2What is data science?
In 2001, William Cleveland published a paper in International Statistical Review [98] that named a new field, “Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics.” As the title suggests, he may not have been intending to name a new field, since he saw his proposal as an expansion of statistics, with roughly 15% of that expanded scope falling under the heading of computer science. Whether data science is a new field is a matter of some debate, as we’ll see in Section 1.3, though I will sometimes refer to it as a field for the sake of convenience.
In Doing Data Science [427], Cathy O’Neil and Rachel Schutt say that the term “data scientist” wasn't coined until seven years after Cleveland’s article, in 2008. The first people to use it were employees of Facebook and LinkedIn, tech companies where many of the newly-christened data scientists were employed.
To explain this new field, Drew Conway created perhaps the most famous and reused diagram in data science, a Venn diagram relating mathematics, statistics, computer science, and domain expertise [107]. Something very close to his original appears in Figure 1.1, but you’re likely to encounter many variations on it, because each writer seems to create one to reflect their own preferences.
fig1_1.webp
Figure 1.1: Rendering of Drew Conway’s “data science Venn diagram” [107].
You can think of the three main circles of the diagram as three academic departments, computer science on the top left, math on the top right, and some other (usually quantitative) discipline on the bottom, one that wants to use mathematics and computing to answer some questions. Conway’s top-left circle uses the word “hacking” instead of computer science, because only a small subset of data science work requires formal software engineering skills. In fact, reading data and computing answers from it sometimes involves unexpected and clever repurposing of data or tools, which the word “hacking” describes very well. And mathematicians, in particular, can take heart from Conway’s labeling of the top-right circle not as statistics, but mathematics and statistics, and for good reason. Though Cleveland argued for classifying data science as part of statistics, we will see in Section 1.4 that many areas of mathematics proper are deeply involved in today’s data science work.
The lower-left intersection in the diagram is good news for readers of this text: it claims that data science done without knowledge of mathematics and statistics is a walk into danger. The premise of this text is that mathematicians have less to learn and can thus progress more quickly.
The top intersection is a bit harder to explain, and we will defer a full explanation until Chapter 8, on machine learning. But the gist is that machine learning differs from traditional mathematical modeling in that the analyst does not impose as much structure when using machine learning as he or she would when doing mathematical modeling, thus requiring less domain knowledge. Instead, the machine infers more of the structure on its own.
But Figure 1.1 merely outlines which disciplines come into play. The practice of data science proceeds something like the following.
1.A question arises that could be answered with data.
This may come from the data scientist’s employer or client, who needs the answer to make a strategic decision, or from the data scientist’s own curiosity about the world, perhaps in science, politics, business, or some other area.
2.The data scientist prepares to do an analysis.
This includes find...

Table of contents

  1. Cover
  2. Half Title
  3. Series Page
  4. Title Page
  5. Copyright Page
  6. Contents
  7. Foreword
  8. 1. Introduction
  9. 2. Programming with Data
  10. 3. Linear Algebra
  11. 4. Basic Statistics
  12. 5. Clustering
  13. 6. Operations Research
  14. 7. Dimensionality Reduction
  15. 8. Machine Learning
  16. 9. Deep Learning
  17. 10. Topological Data Analysis
  18. Bibliography
  19. Index