A Concise Introduction to Statistical Inference
eBook - ePub

A Concise Introduction to Statistical Inference

  1. 212 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

A Concise Introduction to Statistical Inference

Book details
Book preview
Table of contents
Citations

About This Book

This short book introduces the main ideas of statistical inference in a way that is both user friendly and mathematically sound. Particular emphasis is placed on the common foundation of many models used in practice. In addition, the book focuses on the formulation of appropriate statistical models to study problems in business, economics, and the social sciences, as well as on how to interpret the results from statistical analyses.

The book will be useful to students who are interested in rigorous applications of statistics to problems in business, economics and the social sciences, as well as students who have studied statistics in the past, but need a more solid grounding in statistical techniques to further their careers.

Jacco Thijssen is professor of finance at the University of York, UK. He holds a PhD in mathematical economics from Tilburg University, Netherlands. His main research interests are in applications of optimal stopping theory, stochastic calculus, and game theory to problems in economics and finance. Professor Thijssen has earned several awards for his statistics teaching.

Frequently asked questions

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes, you can access A Concise Introduction to Statistical Inference by Jacco Thijssen in PDF and/or ePUB format, as well as other popular books in Mathematics & Probability & Statistics. We have over one million books available in our catalogue for you to explore.

Information

Year
2016
ISBN
9781498755801
Edition
1

Chapter 1

Statistical Inference

1.1 What statistical inference is all about

The decisions of politicians, businesses, engineers, not-for-profit organizations, etc. typically have an influence on many people. Changes to child benefits by a government, for example, influence the financial position of many households. The government is interested (hopefully) in the effect of such measures on individual households. Of course, the government can’t investigate the effect on every individual household. That would simply take too much time and make it almost impossible to design general policies. Therefore, the government could restrict itself by focussing on the average effect on households in, say, the lowest income quartile. Even finding out this number is typically too difficult to do exactly. So, the government relies on information obtained from a small subset of these households. From this subset the government will then try to infer the effect on the entire population.
The above example gives, in a nutshell, the goal of statistics. Statistics is the study of collecting and describing data and drawing inferences from these data. Politicians worry about the impact of budgetary measures on the average citizen, a marketeer is concerned with median sales over the year, an economist worries about the variation in employment figures over a 5-year period, a social worker is concerned about the correlation between criminality and drug use, etc. Where do all these professionals get that information from? Usually from data about their object/subject of interest. However, a long list of numbers does not really help these professionals in analysing their subject and in making appropriate decisions accordingly. Therefore, the “raw data” (the list of responses you get if, for example, you survey 500 people) are condensed into manageable figures, tables, and numerical measures. How to construct these is the aim of descriptive statistics. How to use them as evidence to be fed into the decision making process is the aim of inferential statistics and the subject of this book. This chapter introduces in an informal way some of the statistical jargon that you will encounter throughout the book.
Inferential statistics is the art and science of interpreting evidence in the face of uncertainty.
Example 1.1. Suppose that you want to know the average income of all university students in the country (for example, to develop a financial product for students). Then, obviously, you could simply go around the country and ask every student after their income. This would, however, be a very difficult thing to do. First of all, it would be extremely costly. Secondly, you may miss a few students who are not in the country at present. Thirdly, you have to make sure you don’t count anyone twice.
Alternatively, you could only collect data on a subgroup of students and compute their average income as an approximation of the true average income of all students in the country. But now you have to be careful. Because you do not observe all incomes, the average that you compute is an estimate. You will need to have some idea about the accuracy of your estimate. This is where inferential statistics comes in.
Image
Let’s rephrase the above example in more general terms: you wish to obtain information about a summary (called a parameter) of a measurement of a characteristic (called a variable) of a certain group of people/objects/procedures/… (called a population) based on observations from only a subset of the population (called a sample), taking into account the distortions that occur by using a sample rather than the population. All of these boldface notions will be made precise in this book. For now it suffices to have an intuitive idea.
The goal of inferential statistics is to develop methods that we can use to infer properties of a population based on sample information.

1.2 Why statistical inference is difficult

There is a great need for methods to gather data and draw appropriate conclusions from the evidence that they provide. The costs of making erroneous decisions can be very high indeed. Often people make judgements based on anecdotal evidence. That is, we tend to look at one or two cases and then juxtapose these experiences onto our world view. But
Image
anecdotal evidence is not evidence.
At its most extreme, an inference based on anecdotal evidence would be to play the lottery because you heard that a friend of a friend’s grandmother once won it. A collection of anecdotes never forms an appropriate basis from which general conclusions can be drawn.
Example 1.2 (Gardner, 2008). On November 6, 2006, the Globe and Mail ran a story about a little girl, who, when she was 22 months old, developed an aggressive form of cancer. The story recounted her and her parents’ protracted battle against the disease. She died when she was 3 years old. The article came complete with photographs of the toddler showing her patchy hair due to chemotherapy. The paper used this case as the start for a series of articles about cancer and made the little girl, effectively, the face of cancer.
Image
No matter how dreadful this story may be from a human perspective, it is not a good basis for designing a national health policy. The girl’s disease is extremely rare: she was a one-in-a-million case. Cancer is predominantly a disease of the elderly. Of course you could say: “any child dying of cancer is one too many,” but since we only have finite resources, how many grandparents should not be treated to fund treatment for one little girl? The only way to try and come up with a semblance of an answer to such questions is to quantify the effects of our policies. But in order to do that we need to have some idea about effectiveness of treatment in the population as a whole, not just one isolated case.
The human tendency to create a narrative based on anecdotal evidence is very well documented and hard-wired into our brains.1 Our intuition drives us to make inferences from anecdotal evidence. That does not make those inferences any good. In fact, a case can be made that societies waste colossal amounts of money because of policies that are based on anecdotal evidence, propelled to the political stage via mob rule or media frenzy.
In order to control for this tendency, we need to override our intuition and use a formal framework to draw inferences from data. The framework that has been developed over the past century or so to do this is the subject of this book. The concepts, tools, and techniques that you will encounter are distilled from the efforts of many scientists, mathematicians, and statisticians over decades. It represents the core of what is generally considered to be the consensus view of how to deal with evidence in the face of uncertainty.

1.3 What kind of parameters are we interested in?

As stated above, statistics starts with the idea that you want to say something about a parameter, based on information pertaining to only a subgroup of the entire population. Keep in mind the example of average income (parameter) of all university students (population). Of course not every student has the same income (which is the variable that is measured). Instead there is a spread of income levels over the population. We call such a spread a distribution. The distribution tells you, for example, what percentage of students has an income between $5,000 and $6,000 per year. Or what percentage of students has an income above or below $7,000. The parameter of interest in a statistical study is usually a particular feature of this distribution. For example, if you want to have some idea about the center of the distribution, you may want to focus on the mean (or average) of the distribution. Because the mean is so often used as a parameter, we give it a specific symbol, typically the Greek2 letter Îź.

1.3.1 Center of a distribution

The mean, or average, of a population gives an idea about the center of the distribution. It is the most commonly used summary of a population. The average is often interpreted as describing the “typical” case. However, if you collapse an entire population into just one number, there is always the risk that you get results that are distorted. The first question that should be answered in any statistical analysis is: “Is the parameter I use appropriate for my purpose?”
In this book I don’t have much to say about this: we often deal with certain parameters simply because the theory is best developed for them. A few quick examples, though, should convince you that the question of which parameter to study is not always easy to answer.3 Imagine that you are sitting in your local bar with eight friends and suppose that each of you earns $40,000 per year. The average income of the group is thus $40,000. Now suppose that the local millionaire walks in who has an income of $1,500,000 per year. The average income of your group now is $186,000. I’m sure you’ll agree that average income in this case is not an accurate summary of the population.
This point illustrates that the mean is highly sensitive to outliers: extreme observations, either large or small. In the income case it might be better to look at the median. This is the income level such that half the population earns more and half the population earns less. In the bar example, no matter whether the local millionaire is present, the median income is $40,000. The difference between mean and median can be subtle and lead to very different interpretations of, say, the consequences of policy. For example, during the George W. Bush administration, it was at one point claimed that new tax cuts meant that 92 million Americans would, on average, get a tax reduction of over $1,000. Technically, this statement was correct: over 92 million Americans received a tax cut and the average value was $1,083. However, the median tax cut was under $100. In other words, a lot of people got a small tax cut, whereas a few Americans got a very large tax cut.
Not that the median is always a good measure to describe the “typical” case in a population either. For example, if a doctor tells you after recovery from a life-saving operation that the median life expectancy is 6 months, you may not be very pleased. If, however, you knew that the average life expectancy is 15 years, the picture looks a lot better. What is happening here is that a lot of patients (50%) die very soon (within 6 months) after the operation. Those who survive the first 6 months can look forward to much longer lives.
Both examples...

Table of contents

  1. Cover
  2. Half Title
  3. Title Page
  4. Copyright Page
  5. Dedication
  6. Table of Contents
  7. List of Figures
  8. Preface
  9. Acknowledgments
  10. 1 Statistical Inference
  11. 2 Theory and Calculus of Probability
  12. 3 From Probability to Statistics
  13. 4 Statistical Inference for the Mean based on a Large Sample
  14. 5 Statistical Models and Sampling Distributions
  15. 6 Estimation of Parameters
  16. 7 Confidence Intervals
  17. 8 Hypothesis Testing
  18. 9 Linear Regression
  19. 10 Bayesian Inference
  20. Appendices
  21. References
  22. Index