Applying Regression and Correlation
eBook - ePub

Applying Regression and Correlation

A Guide for Students and Researchers

  1. 272 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Applying Regression and Correlation

A Guide for Students and Researchers

Book details
Book preview
Table of contents
Citations

About This Book

This book takes a fresh look at applying regression analysis in the behavioural sciences by introducing the reader to regression analysis through a simple model-building approach.

The authors start with the basics and begin by re-visiting the mean, and the standard deviation, with which most readers will already be familiar, and show that they can be thought of a least squares model. The book then shows that this least squares model is actually a special case of a regression analysis and can be extended to deal with first one, and then more than one independent variable.

Extending the model from the mean to a regression analysis provides a powerful, but simple, way of thinking about what students believe are the more complex aspects of regression analysis.

The authors gradually extend the model to include aspects of regression analysis such as non-linear regression, logistic regression, and moderator and mediator analysis. These approaches are often presented in terms that are too mathematical for non-statistically inclined students to deal with.

Throughout the book maintains a conceptual, non-mathematical focus. Most equations are placed in an appendix, where a detailed explanation is given, to avoid disrupting the flow of the main text.

This book will be indispensable for anyone using regression and correlation from undergraduates doing projects to postgraduate and researchers.

Frequently asked questions

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes, you can access Applying Regression and Correlation by Jeremy Miles,Mark Shevlin in PDF and/or ePUB format, as well as other popular books in Social Sciences & Social Science Research & Methodology. We have over one million books available in our catalogue for you to explore.

Information

PART I
I NEED TO DO REGRESSION ANALYSIS TOMORROW
1Building models with regression and correlation
1.1 What are models?
How tall do you (the reader) think we (the authors) are? Decide your answer before you read on.
Your answer may have been, ‘What a stupid question’ or ‘How should I know?’ If you think about it, the question is not very stupid, and you should have some idea. You can see (from our names on the front cover) that we are both male. You can safely assume that we are not so young that we are not fully grown, and we will tell you that we did not start smoking until we had almost reached our full adult heights, thus stunting our growth only a little. Given this, you know that we are not 4 inches tall, nor are we 100 feet tall. A reasonable guess, given the information that you have, would be that we are of about average height for males – approximately 5 feet and 10 inches (1.77 m). We are in fact both slightly over 6 feet (1.83 m) tall, so if you had guessed the average height, your guess would have been a couple of inches out – not bad really.
What you did was to build a model. Your model was ‘The authors are 5 feet 10 inches tall.’ This is a model of our heights – it is a very simple model, but a model nonetheless. A model is a representation of the world, but it is rarely a perfect representation. There will always be some differences between the model and the world, that is some error. If we could build a perfect copy, it would not be a model, it would be a duplicate.
What you did when trying to guess our height was to pick a value that had as little error in it as possible. The value with the least chance of error would be the average height. There is more chance of us being average than being unusual – that’s what averages are. Your model was as close as possible to the data (our heights), but there was some error remaining. A different way of saying this is that:
DATA = MODEL + ERROR
This is a very important statement, which we shall refer to a lot throughout this chapter, so let’s look at a practical example.
Imagine that we have some data on the number of books on research methods and statistics that a small group of psychology students has read during their studies. Table 1.1 shows these data.
TABLE 1.1
Name Number of books read
Anne 2
Bob 4
Carol 1
David 0
Esther 3
If we want to model those data, we could do it by repeating the numbers. We could say 2, 4, 1, 0, 3. The numbers that make up the model are known as parameters, and this model has five of them.
In terms of:
DATA = MODEL + ERROR
DATA are equal to MODEL, and so ERROR is zero; there is no difference between the model and the data. This model is a perfect representation of the data. However, there is a problem with this model. There were five numbers in the data, and there are five numbers (or parameters) in the model, so the model has not summarised anything – it is not really a model, but a duplicate. A model should be a simple, or parsimonious, representation of a phenomenon. In this case the model is the data, and we are back where we started (so in fact we could argue that this is not a model at all). That the model and the data are exactly the same is not much of a problem when we are dealing with five numbers, but if we are trying to summarise 500 numbers, this approach will not work, so we need a different approach.
If we want to model the data with one parameter, we could use the mean. This is what people commonly call the average (although it is better to avoid this term as it has more than one definition). To find the mean score, we add together the five numbers (sum the set of numbers), and divide this sum or total by the number of people:
2 + 4 + 1 + 0 + 3 = 10
10/5 = 2
We have calculated that the mean number of books on statistics read by these psychology students is two.1 We have used the mean as a simple model, which has one parameter, and describes the data.
1.2 Least squares models
1.2.1 A very simple model
We saw in the previous section that we used a model that contained one parameter, the mean, to represent a set of five numbers. We used the mean for a good reason because it summarised the data: with just one number, it gives us a general idea about a whole set of numbers.
The mean is a special type of model; it is a least squares model. In this section, we shall see what we mean by a least squares model, and find out why the fact that the mean is a least squares model is important. Remember that (we said we would keep coming back to this):
DATA = MODEL + ERROR
DATA was the number of books read by each student; the numbers 2, 4, 1, 0, 3. Our model is the mean, the number 2. The difference between the model and the data is ERROR. If:
DATA = MODEL + ERROR
it is also true that:
ERROR = DATA – MODEL
Table 1.2 shows the number of books that each student has read along with the differences between the number for each student and the model (the mean). The difference between the model and the number of books read by a particular student is what we call the error for that student.
TABLE 1.2
images
In statistics, the errors (or differences) such as those shown in the table are sometimes called residuals. The residuals are what are left over after the model (mean) has been taken away from each student’s score. They are the difference between the score predicted by the model and the score that each individual actually has.
We said earlier that the model we were going to select was the model that gave the least error. We picked the mean height for males as your model for the heights of the authors of this book because that would have been your best guess. The score for each person is the mean plus (or minus) some error.
Often in statistics, we want to refer to a whole set of numbers, rather than just one individual number. For example, where we have a set of numbers that refer to the number of books read, we would call this x. We will often call this a variable, as it is something that can vary between people. You may also see it referred to as a ‘vector’ in more mathematically inclined texts. We can refer to the first number in the variables (or vector) as x1, the second as x2, etc. We can write Table 1.2 as:
images
The above list is the statistical way of saying that the score for the first person is equal to the mean plus the residual for the first person, the score for the second person is equal to the mean plus the residual for the second person, and so on. In statistics, we can use the subscript i for the individuals. Instead of writing out the full set of equations, as we did above, we can write:
images
This means ‘take this equation and repeat it for every person’. Sometimes statisticia...

Table of contents

  1. Cover Page
  2. Title
  3. Copyright
  4. Contents
  5. Preface
  6. Part I: I need to do Regression Analysis Tomorrow
  7. Part II: I need to do Regression Analysis Next Week
  8. Part III: I need to know more of The Things that Regression Can do
  9. Appendix 1 Equations
  10. Appendix 2 Doing regression with SPSS
  11. Appendix 3 Statistical tables
  12. References
  13. Name index
  14. Subject index