Machine Learning
eBook - ePub

Machine Learning

a Concise Introduction

  1. English
  2. ePUB (mobile friendly)
  3. Available on iOS & Android
eBook - ePub

Machine Learning

a Concise Introduction

Book details
Book preview
Table of contents
Citations

About This Book

AN INTRODUCTION TO MACHINE LEARNING THAT INCLUDES THE FUNDAMENTAL TECHNIQUES, METHODS, AND APPLICATIONS PROSE Award Finalist 2019
Association of American Publishers Award for Professional and Scholarly Excellence

Machine Learning: a Concise Introduction offers a comprehensive introduction to the core concepts, approaches, and applications of machine learning. The author—an expert in the field—presents fundamental ideas, terminology, and techniques for solving applied problems in classification, regression, clustering, density estimation, and dimension reduction. The design principles behind the techniques are emphasized, including the bias-variance trade-off and its influence on the design of ensemble methods. Understanding these principles leads to more flexible and successful applications. Machine Learning: a Concise Introduction also includes methods for optimization, risk estimation, and model selection— essential elements of most applied projects. This important resource:

  • Illustrates many classification methods with a single, running example, highlighting similarities and differences between methods
  • Presents R source code which shows how to apply and interpret many of the techniques covered
  • Includes many thoughtful exercises as an integral part of the text, with an appendix of selected solutions
  • Contains useful information for effectively communicating with clients

A volume in the popular Wiley Series in Probability and Statistics, Machine Learning: a Concise Introduction offers the practical information needed for an understanding of the methods and application of machine learning.

STEVEN W. KNOX holds a Ph.D. in Mathematics from the University of Illinois and an M.S. in Statistics from Carnegie Mellon University. He has over twenty years' experience in using Machine Learning, Statistics, and Mathematics to solve real-world problems. He currently serves as Technical Director of Mathematics Research and Senior Advocate for Data Science at the National Security Agency.

Frequently asked questions

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes, you can access Machine Learning by Steven W. Knox in PDF and/or ePUB format, as well as other popular books in Computer Science & Data Mining. We have over one million books available in our catalogue for you to explore.

Information

Publisher
Wiley
Year
2018
ISBN
9781119438984
Edition
1

1
Introduction—Examples from Real Life

To call in a statistician after the experiment is done may be no more than asking him to perform a postmortem examination: he may be able to say what the experiment died of.
—R. A. Fisher, Presidential Address, 1938
The following examples will be used to illustrate the ideas of the next chapter.
Problem 1 (“Shuttle”). The space shuttle is set to launch. For every previous launch, the air temperature is known and the number of O-rings on the solid rocket boosters which were damaged is known (there are six O-rings, and O-ring damage is a potentially catastrophic event). Based on the current air temperature, estimate the probability that at least one O-ring on a solid rocket booster will be damaged if the shuttle launches now.
This is a regression problem. Poor analysis, and poor communication of some good analysis (Tufte, 2001), resulted in the loss of the shuttle Challenger and its crew on January 28, 1986.
Problem 2 (“Ballot”). Immediately after the 2000 US presidential election, some voters in Palm Beach County, Florida, claimed that a confusing ballot form caused them to vote for Pat Buchanan, the Reform Party candidate, when they thought they were voting for Al Gore, the Democratic Party candidate. Based on county-by-county demographic information (number of registered members of each political party, number of people with annual income in a certain range, number of people with a certain level of education, etc.) and county-by-county vote counts from the 1996 presidential election, estimate how many people in Palm Beach County voted for Buchanan but thought they were voting for Gore.
This regression problem was studied a great deal in 2000 and 2001, as the outcome of the vote in Palm Beach County could have decided the election.
Problem 3 (“Heart”). A patient who is suffering from acute chest pain has entered a hospital, where several numerical variables (for example, systolic blood pressure, age) and several binary variables (for example, whether tachycardia present or not) are measured. Identify the patient as “high risk” (probably will die within 30 days) or “low risk” (probably will live 30 days).
This is a classification problem.
Problem 4 (“Postal Code”). An optical scanner has scanned a hand-written ZIP code on a piece of mail. It has approximately separated the digits, and each digit is represented as an 8 × 8 array of pixels, each of which has one of 256 gray-scale values, 0 (white), ..., 255 (black). Identify each pixel array as one of the digits 0 through 9.
This is a classification problem which affects all of us (though not so much now as formerly).
Problem 5 (“Spam”). Identify email as “spam” or “not spam,” based only on the subject line. Or based on the full header. Or based on the content of the email.
This is probably the best known and most studied classification problem of all, solutions to which are applied many billions of times per day.1
Problem 6 (“Vault”). Some neolithic tribes built dome-shaped stone burial vaults. Given the location and several internal measurements of some burial vaults, estimate how many distinct vault-building cultures there have been, say which vaults were built by which culture and, for each culture, give the dimensions of a vault which represents that culture’s ideal vault shape (or name the actual vault which best realizes each culture’s ideal).
This is a clustering problem.

Notes

1 In 2013, approximately 182.9 billion emails were sent per day, on average, worldwide (Radicati and Levenstein, 2013).

2
The Problem of Learning

Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise.
—John Tukey, The Future of Data Analysis, 1962
This book treats The Problem of Learning, which can be stated generally and succinctly as follows.
The Problem of Learning. There are a known set
and an unknown function f on
. Given data, construct a good approximation
of f. This is called learning f.
The problem of learning has been studied in many guises and in different fields, such as statistics, computer science, mathematics, and the natural and social...

Table of contents

  1. Cover
  2. Title page
  3. Copyright
  4. Preface
  5. Organization—How to Use This Book
  6. Acknowledgments
  7. About the Companion Website
  8. Chapter 1: Introduction—Examples from Real Life
  9. Chapter 2: The Problem of Learning
  10. Chapter 3: Regression
  11. Chapter 4: Survey of Classification Techniques
  12. Chapter 5: Bias–Variance Trade-off
  13. Chapter 6: Combining Classifiers
  14. Chapter 7: Risk Estimation and Model Selection
  15. Chapter 8: Consistency
  16. Chapter 9: Clustering
  17. Chapter 10: Optimization
  18. Chapter 11: High-Dimensional Data
  19. Chapter 12: Communication with Clients
  20. Chapter 13: Current Challenges in Machine Learning
  21. Chapter 14: R Source Code
  22. Appendix A: List of Symbols
  23. Appendix B: Solutions to Selected Exercises
  24. Appendix C: Converting Between Normal Parameters and Level-Curve Ellipsoids
  25. Appendix D: Training Data and Fitted Parameters
  26. References
  27. Index
  28. EULA