1
Introduction: how to get started with statistics
What’s in this chapter?
- The misuse of statistics
- Computers and statistics
- How to use this text
- Which chapter do you want?
INTRODUCTION
This chapter explains why we need statistics and why you should love them. It explains why it is important to understand statistics, which is principally so that we don’t get fooled by numbers. It also provides a guide to how this book is best used. We realise that most readers will go to the chapter that best suits their immediate needs and are only reading this if it is the last book in their bag and their train has been indefinitely delayed. If you are this position then we hope it gets better soon.
STUDYING STATISTICS IS GREAT
This heading is not an indication of madness on the part of the authors. Statistics really is great and it is a remarkable observation that when students finish their statistics courses after much pain and gnashing of teeth they often come to this conclusion as well. It is the most useful thing you will learn on your degree. Give us a minute (or a couple or paragraphs) and we will attempt to convince you that this statement is not as deranged as it may seem.
Tip: Statistics and statistics
Rather confusingly, the word ‘statistics’ means two things. Originally, ‘statistics’ were numbers. The mean of a sample, for example, is a statistic. However, the study of those statistics gave rise to an academic subject, also called ‘statistics’. Hence we can say: ‘Statistics are great, I love them’ and ‘Statistics is great, I love it’. Both sentences are grammatically correct, but have different meanings. The first is talking about numbers, the second is talking about the subject.
We learn about statistics because we want to find stuff out. We want to find out two sorts of things. First, we want to find out what our results tell us, and we can do this by using statistics to analyse data. When we analyse our data, and see what they are telling us, we find stuff out. Sometimes we shed light on a problem, sometimes we don’t. Whichever we do, we make a contribution to knowledge, even if that knowledge is only ‘don’t try to do it this way, it won’t work’. If we don’t do statistical analysis on our data, we will not be able to draw appropriate conclusions. In short, if we don’t do statistics, we won’t know what works. This text is aimed at illuminating how statistics work and what they tell us.
Tip: Data
‘Data’ is the plural of the singular term ‘datum’. You should write ‘data are analysed’ and ‘data have been entered into the computer’, not ‘data is …’ or ‘data has been …’. Be sure to point out when your lecturers make this mistake. Lecturers enjoy it when students point out this sort of simple error.
Second, we want to know about the statistics we get from other people. This is most important because we are bombarded with statistical data every day and they are often used to confuse rather than to clarify. There is a famous quote attributed to Andrew Lang: ‘He uses statistics like a drunk uses a lamppost – more for support than for illumination.’ We need to know when people are trying to illuminate what they have found, and when they are trying to simply support their preformed opinions.
Consider the following extract:
The number of automatic plant shutdowns (scrams) remained at a median of zero for the second year running, with 61% of plants experiencing no scrams.
(Nuclear Europe Worldscan, July/August 1999)
Do you know anything more about nuclear plants after reading that? It is likely that whoever wrote this was using statistics for support rather than for illumination. (Many more examples can be found in Chance News, at http://www.dartmouth.edu/~chance/chance_news/
news.html).
THE MISUSE OF STATISTICS
Perhaps the most famous quote about statistics is commonly attributed to British Prime Minister, Benjamin Disraeli,1 who is reported to have said:
There are three kinds of lies: lies, damned lies and statistics.
Less well known is the comment attributed to another British Prime Minister, Winston Churchill, who said:
When I call for statistics about the rate of infant mortality, what I want is proof that fewer babies died when I was Prime Minister than when anyone else was Prime Minister. That is a political statistic.
It is a popular view that statistics can be made to say anything you want and therefore they are all worthless. While it is clearly true that people will selectively present data to misrepresent what is actually happening, it is not true that statistics are therefore worthless. If we have a better understanding of where data come from and how they are being presented then we will not be fooled by the politicians, advertisers, journalists, homeopaths and assorted other charlatans who try to confuse and fool us.
Tip
One of the reasons why statistics sometimes appear difficult is that they are often counter–intuitive. Think about your friends, for example: half of them are below average. Or, in a wider context, if you have the view that the average car driver is ignorant and thoughtless, then by definition half of them are even more ignorant and thoughtless than that. Then there was the man who drowned crossing a stream with an average depth of 6 inches (attributed to W.I.E. Gates).
IS STATISTICS HARD AND BORING?
When students find out that they have to learn about statistics as part of their course, they are often somewhat dismayed. They think that statistics is likely to be hard, and is also likely to be boring. In this text we will try and make it not quite so hard and not quite so boring, but you have to be the judge of how successful we are.
We have made this text as clear as we can and as straightforward as we can, but we have not simplified it so much that we skip over important bits. Albert Einstein wrote, ‘Everything should be made as simple as possible, but not simpler’, and we have tried to follow this principle.
One way to make statistics less hard is to provide a set of clear and explicit instructions, much like a cookbook. For example, if you want to make mashed potatoes, you can follow a set of instructions like this:
- Wash and peel potatoes.
- Cut larger potatoes in half.
- Put potatoes in saucepan of hot water and boil for 20 minutes.
- Drain the potatoes.
- Add milk, salt, butter to saucepan.
- Mash, with a potato masher, using an up-and-down motion.
This isn’t hard. It just involves following a set of rules, and doing what they say. It isn’t very interesting, and there is no room for creativity or flexibility. We don’t expect you to understand anything about why you do what you do. We do not try to explain to you anything about the potatoes, or the cooking process, we just expect you to follow the rules. If you had to follow instructions like this every time you made a meal you would find it very dull, however, and would probably just send out for a kebab.
A bigger problem would be that if something went wrong with the cooking, you would be in no state to fix it because you don’t know what is happening and why. The cookbook approach to statistics might get you to the right answer but you will only have a limited understanding of how you got there. The problem with this is that it is difficult to discuss the quality of your data and the strength of your conclusions. The cookbook approach is not so hard to do, but it doesn’t help your understanding.
The approach in this text is to give you the cookbook recipe but also to tell you why it is done this way and what to do in a wide range of circumstances. We hope this allows you to still get to the right result fairly quickly but also to understand how you got there. Staying with the cooking analogy, we will tell you a bit about potatoes and the general process of cooking. ‘Too much detail!’, you might cry, but you’ll thank us for it later.
Tip
Statistics can be off-putting because of the terms and equations that appear all over the pages like a rash. Don’t be put off. The equations are much more straightforward than they look, and if you can do multiplication and subtraction you should be fine. For example, the mean score is commonly written as x, and once you get used to this and some of the other shorthand then it will become clearer. Imagine you are in a foreign country with a language you can’t speak. You don’t need to know the whole language, just a few key phrases like ‘two beers, please’ and ‘where’s the toilet?’. It is the same with statistics, so just get comfortable with a few key terms and the Land of Statistics will be there for you to explore.
There is another way to deal with statistics, and that is the way that we commonly deal with technology. We open the box, connect everything up and puzzle our way through the various controls. We will only look at the instructions at the point where it either refuses to work or we have broken it. Let’s face it, instructions are for wimps! We anticipate that many readers will have adopted this strategy and will be reading this book because their analysis has just gone horribly wrong. It clearly does not help to suggest that this was probably not the best strategy, but all is not lost and the last chapter, with its checklist of important points, will hopefully diagnose your problem and tell you where to go in the text to find the answer.
COMPUTERS AND STATISTICS
Computers have made statistics much harder.
Well, they haven’t really, but they have made learning about statistics much harder. And they have done this by making it easier to do hard things.
OK, we know that this is a statistics book, which you were expecting to be a bit tricky, at least in places. And you are reading nonsense like this before you have even got to the statistics, so let us explain. When we were students (and computers were the size of an Eddie Stobart truck), learning about statistics primarily involved learning about lots of different formulae. We were presented with formulae and we had to apply them and use them. The majority of the time that people spent doing statistics was spent working through the formulae that were given in books. This wasn’t difficult, except that it was ...