- 298 pages
- English
- ePUB (mobile friendly)
- Available on iOS & Android
Linear Models with Python
About This Book
Praise for Linear Models with R:
This book is a must-have tool for anyone interested in understanding and applying linear models. The logical ordering of the chapters is well thought out and portrays Faraway's wealth of experience in teaching and using linear models. ⌠It lays down the material in a logical and intricate manner and makes linear modeling appealing to researchers from virtually all fields of study. -Biometrical Journal
Throughout, it gives plenty of insight ⌠with comments that even the seasoned practitioner will appreciate. Interspersed with R code and the output that it produces one can find many little gems of what I think is sound statistical advice, well epitomized with the examples chosenâŚI read it with delight and think that the same will be true with anyone who is engaged in the use or teaching of linear models. -Journal of the Royal Statistical Society
Like its widely praised, best-selling companion version, Linear Models with R, this book replaces R with Python to seamlessly give a coherent exposition of the practice of linear modeling. Linear Models with Python offers up-to-date insight on essential data analysis topics, from estimation, inference and prediction to missing data, factorial models and block designs. Numerous examples illustrate how to apply the different methods using Python.
Features:
-
- Python is a powerful, open source programming language increasingly being used in data science, machine learning and computer science. Python and R are similar, but R was designed for statistics, while Python is multi-talented.
-
- This version replaces R with Python to make it accessible to a greater number of users outside of statistics, including those from Machine Learning.
-
- A reader coming to this book from an ML background will learn new statistical perspectives on learning from data.
-
- Topics include Model Selection, Shrinkage, Experiments with Blocks and Missing Data.
-
- Includes an Appendix on Python for beginners.
Linear Models with Python explains how to use linear models in physical science, engineering, social science and business applications. It is ideal as a textbook for linear models or linear regression courses.
Frequently asked questions
Information
Chapter 1
Introduction
1.1 Before You Start
- Understand the physical background. Statisticians often work in collaboration with others and need to understand something about the subject area. Regard this as an opportunity to learn something new rather than a chore.
- Understand the objective. Again, often you will be working with a collaborator who may not be clear about what the objectives are. Beware of âfishing expeditionsâ â if you look hard enough, you will almost always find something, but that something may just be a coincidence.
- Make sure you know what the client wants. You can often do quite different analyses on the same dataset. Sometimes statisticians perform an analysis far more complicated than the client really needed. You may find that simple descriptive statistics are all that are needed.
- Put the problem into statistical terms. This is a challenging step and where irreparable errors are sometimes made. Once the problem is translated into the language of statistics, the solution is often routine. This is where human intelligence is decidedly superior to artificial intelligence. Defining the problem is hard to program. That a statistical method can read in and process the data is not enough. The results of an inapt analysis may be meaningless.
- Are the data observational or experimental? Are the data a sample of convenience or were they obtained via a designed sample survey? How the data were collected has a crucial impact on what conclusions can be made.
- Is there nonresponse? The data you do not see may be just as important as the data you do see.
- Are there missing values? This is a common problem that is troublesome and time consuming to handle.
- How are the data coded? In particular, how are the categorical variables represented?
- What are the units of measurement?
- Beware of data entry errors and other corruption of the data. This problem is all too common â almost a certainty in any real dataset of at least moderate size. Perform some data sanity checks.
1.2 Initial Data Analysis
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy as sp
import seaborn as sns
import statsmodels.formula.api as smf
import faraway.datasets.pima
pima = faraway.datasets.pima.load()
pima.head()
pregnant glucose diastolic triceps insulin bmi diabetes age test
0 6 148 72 35 0 33.6 0.627 50 1
1 1 85 66 29 0 26.6 0.351 31 0
2 8 183 64 0 0 23.3 0.672 32 1
3 1 89 66 23 94 28.1 0.167 21 0
4 0 137 40 35 168 43.1 2.288 33 1
Table of contents
- Cover
- Half Title
- Series Page
- Title Page
- Copyright Page
- Contents
- Preface
- 1 Introduction
- 2 Estimation
- 3 Inference
- 4 Prediction
- 5 Explanation
- 6 Diagnostics
- 7 Problems with the Predictors
- 8 Problems with the Error
- 9 Transformation
- 10 Model Selection
- 11 Shrinkage Methods
- 12 Insurance Redlining â A Complete Example
- 13 Missing Data
- 14 Categorical Predictors
- 15 One-Factor Models
- 16 Models with Several Factors
- 17 Experiments with Blocks
- A About Python
- Bibliography
- Index