PART ONE
Pre-test considerations
CHAPTER 1
Introduction
WHAT THIS BOOK DOES
After an introduction which should be invaluable to beginners and those returning to statistical testing after a break, this book introduces statistical tests in a well-organised manner, providing worked examples using both parametric and non-parametric tests.
Whether you are a beginner or an intermediate level test user, you should be able to use this book to analyse different types of data in applied settings. It should also give you the confidence to use other statistical software and to extend your expertise to more specific scientific settings as required.
This book assumes that many applied researchers, scientific or otherwise, will not want to use statistical equations or to learn about a range of arcane statistical concepts. Instead, it is a very practical, easy and speedy introduction to data analysis in the round, offering examples from a range of scenarios from applied science, handling both continuous and rough-hewn data sets.
Examples will be found from agriculture, arboriculture, audiology, biology, computer science, ecology, engineering, epidemiology, farming and farm management, hydrology, medicine, ophthalmology, pharmacology, physiotherapy, spectroscopy and sports science. These disciplines have not been covered in depth, as this book is intended to provide a general approach to solving problems using statistical tests.
The output, with permission from IBM, comes from SPSS (PASW) Student Version 18, for the purpose of the widest usability, and the Advanced Module of SPSS 20. It is completely compatible with SPSS versions 17 to 20 (including those packages with the title PASW) and will generally be usable with earlier editions. As SPSS tends not to change much over the years, this book is likely to be relevant for quite some time. SPSS features are used selectively here for the sake of clarity. Various manuals and handbooks are available on the internet and in print for those eager to know every possible detail of its use.
Similarly, as the book is essentially about statistical testing, research design is generally only touched on for the purposes of clarity. Again, there are a lot of sources of information out there, especially relating to different specialisms.
In contrast to many books on statistics, I favour coherence over conceptual comprehensiveness, although as will be seen, this book offers some tests not usually found in other introductory books.
THE ORGANISATION OF CONTENT
Although many core concepts are presented in the first part of the book, which should definitely be read by newcomers to statistical testing, other ideas appear where they logically arise. Although mathematics is barely touched upon, statistical jargon is introduced, as you will meet it in SPSS and other software as well as in research papers which you may read or even find yourself writing. Descriptive statistics are introduced, as it is important in the preliminary analysis of data, but are dealt with sparingly: inferential statistics are at the heart of statistical testing. The first part of the book also offers a quick and basic guide to using SPSS.
The second part of the book comprises the tests. Each test is accompanied by at least one worked example. Where possible, non-parametric equivalents are provided in addition to parametric tests; we recognise that data sets in the real world are not always as blandly measurable as we would wish them to be.
The chapter on experiments and quasi-experiments â essentially, the analysis of differences â is fairly conventional, apart from equal consideration being given to non-parametric tests as useful tools in applied settings. Factorial analysis of variance (e.g. two-way ANOVA) is also covered, although a discussion about the analysis of covariance (ANCOVA) is deferred until the brief chapter on advanced techniques.
The chapter on the frequency of observation â also known as qualitative (or categorical) analysis â offers a broader set of practical usages than in most introductory texts.
Survival analysis is also new to general introductory texts, but given its wide applicability outside the world of medicine, I prefer to call it the analysis of the time until events. Although this is also qualitative in nature, it is so different in function as to be worthy of a separate chapter.
The next chapter starts with correlations, but goes beyond some contemporary texts in introducing multiple regression, which is increasingly used in applied settings. It also provides a stripped down account of factor analysis, which will meet the needs of people on masterâs and doctoral projects (and others) who find themselves needing to use this technique in a hurry. Many so-called simple introductions are generally nothing of the sort. The core coverage provided here meets immediate needs, but will also make it easier to absorb more in-depth texts when necessary.
The third part of the book includes a short set of exercises. Problems in the real world are not usually accompanied by signposts saying âthis problem involves correlationsâ, so I have avoided the common practice of putting a quiz at the end of each chapter. I think it makes most sense to tackle exercises once you have an overall grasp of what you have read and the experience of having worked through the preceding worked examples.
The chapter on reporting is intended for organisations with practical concerns; academic writers will need to use works of reference specific to their disciplines or universities. The book concludes with a brief summary of a few advanced statistical techniques.
DATA SETS AND ADDITIONAL INFORMATION
The data sets are small, to avoid lengthy data entry or the need for internet downloads. Following the same logic, some data sets are built upon as each chapter progresses. While the worked examples should be of interest to various practitioners, it should be noted that the data sets are for learning purposes only and are fictional unless there is a clear statement to the contrary.
The book contains various âdiscussion pointsâ, which draw the readerâs attention to statistical topics that are philosophically interesting or controversial.
On the subject of controversy, I may add that independent researchers will find SPSS to be rather an expensive piece of software. A cheaper option is StatsDirect. I wrote a book to accompany this package (Davis 2010), but do note that the data sets and texts are similar in both books. I do not recommend buying both. If a choice has to be made, then this book is more comprehensive in its range of tests and concepts.
HOW TO USE THIS BOOK
If you do not have to time to read the whole book, it is still a good idea to read the introductory part before homing in on the chapter of interest. If time dictates dipping into a single chapter, then try to read the whole chapter and follow the worked examples.
References to statistical theory may be skipped over by first time readers, but they may in time improve your understanding of the issues. When you have a full grasp of this book, you should be able to use other software and more advanced tests.
ACKNOWLEDGEMENTS
I would particularly like to thank Dr George Clegg, a scientist with experience in academic research and the defence industry, who asked some hard questions about what I intended to write. Thanks are also due to Nick Jones for his encouragement during the development of this book, and Ofra Reuven, statistician and data analyst, for her speedy and reliable help creating images and checking through my data.
Permission was granted by IBM to use screenshots from the IBM statistical testing package.
I would also like to thank the Orwell Estate for their goodwill over the dedication of this book. George Orwellâs essays and books have given me food for thought and themes for debate over the decades. His integrity stands as a beacon.
The responsibility for any shortcomings remains my own.
DISCUSSION POINT
Statistical testing is like driving a car. You need to know where you are going and what to do when you get there, but the workings of the engine need not necessarily bother you. It is my contention that formulae are of little relevance to effective data analysis.
CHAPTER 2
Descriptive and inferential statistics introduced
DESCRIPTIVE STATISTICS
This book is primarily about inferential statistics, generalising from limited data, but some knowledge of descriptive statistics is essential. When we have all the data, the entire population rather than a sample, descriptive statistics may tell us all we need to know. When looking at samples, the descriptive data helps us to decide which statistical tests to use and indeed if any tests should be used. The statistical concepts discussed (lightly) here underlie what the tests try to achieve.
A statistic is a number which represents or summarises data. Descriptive statistics reveal how much data is involved and its shape.
There are times when an absolute number gives us what we want. We can ha...