Applied Univariate, Bivariate, and Multivariate Statistics Using Python
A Beginner's Guide to Advanced Data Analysis
- English
- ePUB (mobile friendly)
- Available on iOS & Android
Applied Univariate, Bivariate, and Multivariate Statistics Using Python
A Beginner's Guide to Advanced Data Analysis
About This Book
Applied Univariate, Bivariate, and Multivariate Statistics Using Python
A practical, "how-to" reference for anyone performing essential statistical analyses and data management tasks in Python
Applied Univariate, Bivariate, and Multivariate Statistics Using Python delivers a comprehensive introduction to a wide range of statistical methods performed using Python in a single, one-stop reference. The book contains user-friendly guidance and instructions on using Python to run a variety of statistical procedures without getting bogged down in unnecessary theory. Throughout, the author emphasizes a set of computational tools used in the discovery of empirical patterns, as well as several popular statistical analyses and data management tasks that can be immediately applied.
Most of the datasets used in the book are small enough to be easily entered into Python manually, though they can also be downloaded for free from www.datapsyc.com. Onlyminimal knowledge of statistics is assumed, making the book perfect forthose seeking an easily accessible toolkit for statistical analysis with Python. Applied Univariate, Bivariate, and Multivariate Statistics Using Python represents the fastest way to learn how to analyze data with Python.
Readers will also benefit from the inclusion of:
- A review of essential statistical principles, including types of data, measurement, significance tests, significance levels, and type I and type II errors
- An introduction to Python, exploring how to communicate with Python
- A treatment of exploratory data analysis, basic statistics and visualdisplays, including frequencies and descriptives, q-q plots, box-and-whisker plots, and data management
- An introduction to topics such as ANOVA, MANOVA and discriminant analysis, regression, principal components analysis, factor analysis, cluster analysis, among others, exploring the nature of what these techniques can vs. cannot do on a methodological level
Perfect for undergraduate and graduate students in the social, behavioral, and natural sciences, Applied Univariate, Bivariate, and Multivariate Statistics Using Python will also earn a place in the libraries of researchers and data analysts seeking a quick go-to resource for univariate, bivariate, and multivariate analysis in Python.
Frequently asked questions
Information
1
A Brief Introduction and Overview of Applied Statistics
CHAPTER OBJECTIVES
- How probability is the basis of statistical and scientific thinking.
- Examples of statistical inference and thinking in the COVID-19 pandemic.
- Overview of how null hypothesis significance testing (NHST) works.
- The relationship between statistical inference and decision-making.
- Error rates in statistical thinking and how to minimize them.
- The difference between a point estimator and an interval estimator.
- The difference between a continuous vs. discrete variable.
- Appreciating a few of the more salient philosophical underpinnings of applied statistics and science.
- Understanding scales of measurement, nominal, ordinal, interval, and ratio.
- Data analysis, data science, and “big data” distinctions.
- What is the probability of contracting the virus, and does this probability vary as a function of factors such as pre-existing conditions or age? In this latter case, we might be interested in the conditional probability of contracting COVID-19 given a pre-existing condition or advanced age. For example, if someone suffers from heart disease, is that person at greatest risk of acquiring the infection? That is, what is the probability of COVID-19 infection being conditional on someone already suffering from heart disease or other ailments?
- What proportion of the general population has the virus? Ideally, researchers wanted to know how many people world-wide had contracted the virus. This constituted a case of parameter estimation, where the parameter of interest was the proportion of cases world-wide having the virus. Since this number was unknown, it was typically estimated based on sample data by computing a statistic (i.e. in this case, a proportion) and using that number to infer the true population proportion. It is important to understand that the statistic in this case was a proportion, but it could have also been a different function of the data. For example, a percentage increase or decrease in COVID-19 cases was also a parameter of interest to be estimated via sample data across a particular period of time. In all such cases, we wish to estimate a parameter based on a statistic.
- What proportion of those who contracted the virus will die of it? That is, what is the estimated total death count from the pandemic, from beginning to end? Statistics such as these involved projections of death counts over a specific period of time and relied on already established model curves from similar pandemics. Scientists who study infectious diseases have historically documented the likely (i.e. read: “probabilistic”) trajectories of death rates over a period of time, which incorporates estimates of how quickly and easily the virus spreads from one individual to the next. These estimates were all statistical in nature. Estimates often included confidence limits and bands around projected trajectories as a means of estimating the degree of uncertainty in the prediction. Hence, projected estimates were in the opinion of many media types “wrong,” but this was usually due to not understanding or appreciating the limits of uncertainty provided in the original estimates. Of course, uncertainty limits were sometimes quite wide, because predicting death rates was very difficult to begin with. When one models relatively wide margins of error, one is protected, in a sense, from getting the projection truly wrong. But of course, one needs to understand what these limits represent, otherwise they can be easily misunderstood. Were the point estimates wrong? Of course they were! We knew far before the data came in that the point projections would be off. Virtually all point predictions will always be wrong. The issue is whether the data fell in line with the prediction bands that were modeled (e.g. see Figure 1.1). If a modeler sets them too wide, then the model is essentially quite useless. For instance, had we said the projected number of deaths would be between 1,000 and 5,000,000 in the USA, that does not really tell us much more than we could have guessed by our own estimates not using data at all! Be wary of “sophisticated models” that tell you about the same thing (or even less!) than you could have guessed on your own (e.g. a weather model that predicts cold temperatures in Montana in December, how insightful!).
- Measurement issues were also at the heart of the pandemic (though rarely addressed by the media). What exactly constituted a COVID-19 case? Differentiating between individuals who died “of” COVID-19 vs. died “with” COVID-19 was paramount, yet was often ignored in early reports. However, the question was central to everything! “Another individual died of COVID-19” does not mean anything if we do not know the mechanism or etiology of the death. Quite possibly, COVID-19 was a correlate to death in many cases, not a cause. That is, within a typical COVID-19 death could lie a virtual infinite number of possibilities that “contributed” in a sense, to the death. Perhaps one person died primarily from the virus, whereas another person died because they already suffered from severe heart disease, and the addition of the virus simply complicated the overall health issue and overwhelmed them, which essentially caused the death.
1.1 How Statistical Inference Works
Table of contents
- Cover
- Title page
- Copyright
- Table of Contents
- Preface
- Chapter 1: A Brief Introduction and Overview of Applied Statistics
- Chapter 2: Introduction to Python and the Field of Computational Statistics
- Chapter 3: Visualization in Python: Introduction to Graphs and Plots
- Chapter 4: Simple Statistical Techniques for Univariate and Bivariate Analyses
- Chapter 5: Power, Effect Size, P-Values, and Estimating Required Sample Size Using Python
- Chapter 6: Analysis of Variance
- Chapter 7: Simple and Multiple Linear Regression
- Chapter 8: Logistic Regression and the Generalized Linear Model
- Chapter 9: Multivariate Analysis of Variance (MANOVA) and Discriminant Analysis
- Chapter 10: Principal Components Analysis
- Chapter 11: Exploratory Factor Analysis
- Chapter 12: Cluster Analysis
- References
- Index
- End User License Agreement