Introduction to High-Dimensional Statistics
eBook - ePub

Introduction to High-Dimensional Statistics

  1. 346 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Introduction to High-Dimensional Statistics

Book details
Book preview
Table of contents
Citations

About This Book

Praise for the first edition:

"[This book] succeeds singularly at providing a structured introduction to this active field of research. … it is arguably the most accessible overview yet published of the mathematical ideas and principles that one needs to master to enter the field of high-dimensional statistics. … recommended to anyone interested in the main results of current research in high-dimensional statistics as well as anyone interested in acquiring the core mathematical skills to enter this area of research."
— Journal of the American Statistical Association

Introduction to High-Dimensional Statistics, Second Edition preserves the philosophy of the first edition: to be a concise guide for students and researchers discovering the area and interested in the mathematics involved. The main concepts and ideas are presented in simple settings, avoiding thereby unessential technicalities. High-dimensional statistics is a fast-evolving field, and much progress has been made on a large variety of topics, providing new insights and methods. Offering a succinct presentation of the mathematical foundations of high-dimensional statistics, this new edition:



  • Offers revised chapters from the previous edition, with the inclusion of many additional materials on some important topics, including compress sensing, estimation with convex constraints, the slope estimator, simultaneously low-rank and row-sparse linear regression, or aggregation of a continuous set of estimators.


  • Introduces three new chapters on iterative algorithms, clustering, and minimax lower bounds.


  • Provides enhanced appendices, minimax lower-bounds mainly with the addition of the Davis-Kahan perturbation bound and of two simple versions of the Hanson-Wright concentration inequality.


  • Covers cutting-edge statistical methods including model selection, sparsity and the Lasso, iterative hard thresholding, aggregation, support vector machines, and learning theory.


  • Provides detailed exercises at the end of every chapter with collaborative solutions on a wiki site.


  • Illustrates concepts with simple but clear practical examples.

Frequently asked questions

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes, you can access Introduction to High-Dimensional Statistics by Christophe Giraud in PDF and/or ePUB format, as well as other popular books in Economics & Statistics for Business & Economics. We have over one million books available in our catalogue for you to explore.

Information

Year
2021
ISBN
9781000408355
Edition
2

Chapter 1

Introduction

DOI: 10.1201/9781003158745-1

1.1 High-Dimensional Data

The sustained development of technologies, data storage resources, and computing resources give rise to the production, storage, and processing of an exponentially growing volume of data. Data are ubiquitous and have a dramatic impact on almost every branch of human activities, including science, medicine, business, finance and administration. For example, wide-scale data enable to better understand the regulation mechanisms of living organisms, to create new therapies, to monitor climate and biodiversity changes, to optimize the resources in the industry and in administrations, to personalize the marketing for each individual consumer, etc.
A major characteristic of modern data is that they often record simultaneously thousands up to millions of features on each object or individual. Such data are said to be high-dimensional. Let us illustrate this characteristic with a few examples. These examples are relevant at the time of writing and may become outdated in a few years, yet we emphasize that the mathematical ideas conveyed in this book are independent of these examples and will remain relevant.
  • Biotech data: Recent biotechnologies enable to acquire high-dimensional data on single individuals. For example, DNA microarrays measure the transcription level1 of tens of thousands of genes simultaneously; see Figure 1.1. Next generation sequencing (NGS) devices improve on these microarrays by allowing to sense the “transcription level” of virtually any part of the genome. Similarly, in proteomics some technologies can gauge the abundance of thousands of proteins simultaneously. These data are crucial for investigating biological regulation mechanisms and creating new drugs. In such biotech data, the number p of “variables” that are sensed scales in thousands and is most of the time much larger than the number n of “individuals” involved in the experiment (number of repetitions, rarely exceeding a few hundreds).
1 The transcription level of a gene in a cell at a given time corresponds to the quantity of ARNm associated to this gene present at this time in the cell.
Figure 1.1 Whole human genome microarray covering more than 41,000 human genes and transcripts on a standard 1″ × 3″ glass slide format. © Agilent Technologies, Inc. 2004. Re-produced with permission, courtesy ofAgilent Technologies, Inc.
  • Images (and v...

Table of contents

  1. Cover
  2. Half Title
  3. Title Page
  4. Copyright Page
  5. Table of Contents
  6. Preface, second edition
  7. Preface
  8. Acknowledgments
  9. 1 Introduction
  10. 2 Model Selection
  11. 3 Minimax Lower Bounds
  12. 4 Aggregation of Estimators
  13. 5 Convex Criteria
  14. 6 Iterative Algorithms
  15. 7. Estimator Selection
  16. 8. Multivariate Regression
  17. 9. Graphical Models
  18. 10 Multiple Testing
  19. 11 Supervised Classification
  20. 12 Clustering
  21. Appendix A Gaussian Distribution
  22. Appendix B Probabilistic Inequalities
  23. Appendix C Linear Algebra
  24. Appendix D Subdifferentials of Convex Functions
  25. Appendix E Reproducing Kernel Hilbert Spaces
  26. Notations
  27. References
  28. Index