eBook - ePub

Regression Analysis and its Application

Name: Regression Analysis and its Application
ISBN: 9781351419291

A Data-Oriented Approach

Richard F. Gunst,

Robert L. Mason,

424 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Regression Analysis and its Application

A Data-Oriented Approach

Richard F. Gunst,

Robert L. Mason,

About this book

Regression Analysis and Its Application: A Data-Oriented Approach answers the need for researchers and students who would like a better understanding of classical regression analysis. Useful either as a textbook or as a reference source, this book bridges the gap between the purely theoretical coverage of regression analysis and its practical application.

The book presents regression analysis in the general context of data analysis. Using a teach-by-example format, it contains ten major data sets along with several smaller ones to illustrate the common characteristics of regression data and properties of statistics that are employed in regression analysis. The book covers model misspecification, residual analysis, multicollinearity, and biased regression estimators. It also focuses on data collection, model assumptions, and the interpretation of parameter estimates.

Complete with an extensive bibliography, Regression Analysis and Its Application is suitable for statisticians, graduate and upper-level undergraduate students, and research scientists in biometry, business, ecology, economics, education, engineering, mathematics, physical sciences, psychology, and sociology. In addition, data collection agencies in the government and private sector will benefit from the book.

Tools to learn more effectively

Saving Books

Keyword Search

Annotating Text

Listen to it instead

Information

Publisher

Year

eBook ISBN

Topic

Subtopic

Index

CHAPTER 1

INTRODUCTION

Data analysis of any kind, including a regression analysis, has the potential for far-reaching consequences. Conclusions drawn from small laboratory experiments or extensive sample surveys might only influence one’s colleagues and associates or they could form the basis for policy decisions by governmental agencies which could conceivably affect millions of people. Data analysts must, therefore, have an adequate knowledge of and a healthy respect for the procedures they utilize.

Consider as an illustration of the potential for far-reaching effects of a data analysis one of the most massive research projects ever undertaken, the Salk polio vaccine trials (Meier, 1972). The conclusions drawn from the results of this study ultimately culminated in a nationwide polio immunization program and virtual elimination of this tragic disease in the United States. The foresight and competence of the principal investigators of the study prevented ambiguity of the results and possible criticism of the conclusions. The handling of this experiment provides valuable lessons in the overall role of data analysis and the care with which it must be approached.

Polio in the early 1950’s was a mysterious disease. No one could predict where or when it would strike. It did not affect a large segment of any community but those it did strike, mostly children, were often left paralyzed. Its crippling effect on young children and the sporadic nature of its occurrence led to demands for a major effort in eradicating the disease. Salk’s vaccine was one of the most promising ones available, but it had not been sufficiently tested.

Since the occurrence of polio in any specific community could not be predicted and only a small portion of the population actually contracted the disease in any year, a large-scale experiment including many communities was necessitated. In the end over one million children participated in the study, some receiving the vaccine and others just a placebo.

In allowing their children to participate, many parents insisted on knowing whether their child received the vaccine or the placebo. These children constituted the “observed-placebo” group (Meier, 1972). The planners of the experiment, realizing potential difficulties in the interpretation of the results, insisted that there be a large number of communities for which neither child, parent, nor diagnosing physician knew whether the child received the vaccine or the placebo. This group of children made up the “placebo-control” group.

For both groups of children the incidence of polio was lower for those vaccinated than for those who were not vaccinated. The conclusion was unequivocal: the Salk vaccine proved effective in preventing polio. This conclusion would have been compromised, however, had the planners of the study not insisted that the placebo-control group be included. Doubts that the observed-placebo group could reliably indicate the effectiveness of the vaccine were raised both before and after the experiment. The indicators of polio are so similar to those of some other diseases that the diagnosing physician might tend to diagnose polio if he knew the child had not been vaccinated and diagnose one of the other diseases if he knew the child had been vaccinated. After the experiment was conducted, analysis of the data for the observed-control group indicated that the vaccine was effective but the differences were not large enough to prevent charges of (unintentional) physician bias. Differences in the incidence of polio between vaccinated and nonvaccinated children in the placebo-control group were larger than those in the observed-control group and the analysis of this data provided the definitive conclusion. Thus due to the careful planning and execution of this study, including the data collection and analysis, the immunization program that was later implemented has resulted in almost complete eradication of polio in the United States.

1.1 DATA COLLECTION

Data can be compiled in a variety of ways. For specific types of information, the U. S. Bureau of the Census can rely on nearly complete enumerations of the U. S. population or on data collected using sophisticated sample survey designs. The Bureau of the Census can insure that all segments of the population are represented in most of the analyses that they desire to perform. Many research endeavors, however, are conducted on a relatively smaller scale and are limited by time, manpower, or economics. Characteristic of these studies is a data base that is restricted by the data-collection techniques.

So important is the data base to a regression analysis that we begin our development of multiple linear regression with the data-collection phase. The emphasis of this section is on an understanding of the benefits associated with a good data collection effort and the influence on the interpretation of fitted models when the data base is restricted. While it may not always be possible to build a data base as large or as representative as one might desire, knowledge of the limitations of a data base can prevent many incorrect applications of regression methodology.

1.1.1 Data-Base Limitations

Regression analysis provides information on relationships between a response variable and one or more predictor variables but only to the degree that such information is contained in the data base. Whether the data are compiled from a complete enumeration of a population, an appropriate sample survey, a haphazard tabulation, or by simply inventing data, regression coefficients can be estimated and conclusions can be drawn from the fitted model. The quality of the fit and accuracy of conclusions, however, depend on the data used: data that are not representative or not properly compiled can result in poor fits and erroneous conclusions.

One of many studies that illustrates the problems that arise when one is forced to draw inferences from a potentially nonrepresentative sample is found in Crane (1965). In her attempt to assess the influence of graduate school prestige and current academic affiliation on productivity and peer recognition of university professors, she surveyed faculty members in three disciplines from three universities on the east coast of the United States. The responses were voluntary and presumably not all professors in these disciplines participated in the study. Although Crane’s study did not call for a regression analysis, the interpretation problems that occur as a result of her data-collection effort are applicable regardless of the type of analysis performed.

Questions naturally arise concerning any conclusions that would be drawn from a study with the data-base limitations of this one. Do these three disciplines truly represent all academic disciplines? Can these three universities be said to be typical of all universities in the United States? If some professors chose not to participate in the study, are the responses thereby biased? These questions cannot be answered from Crane’s data. Only if additional studies provide results similar to hers for other disciplines and other schools can global conclusions be drawn concerning the influence of graduate school and current academic affiliation on recognition and productivity of university professors. No amount of statistical analysis can compensate for these data-base limitations.

Criticisms of limited data bases and disagreements with conclusions drawn from the analysis of them are common. Nevertheless, the choice is often between conducting no investigations at all or analyzing restricted sets of data. We do not advocate the former position; however, it is the obligation of the data analyst to investigate the data-collection process, discover any limitations in the data collected, and restrict conclusions accordingly. Another example will stress these points and the consequences of underrating their importance.

A well-publicized study on male sexuality (Kinsey et al., 1948) evoked widespread criticism both because of its controversial subject matter and because of its data-collection procedures. Responses were solicited from males belonging to a large number of groups in order to make the sampling more feasible. About 5,300 males were interviewed in prisons, mental institutions, rooming houses, etc. By interviewing volunteers from groups such as this, a large sample of responses could be obtained without exhaustive effort and expense. The convenience of selecting responses in this fashion is the primary factor contributing to the debate over the results of the study.

Among the criticisms raised about the Kinsey report, most centered on the data-collection process. Some groups (such as college men) were overrepresented while others (such as Catholics) were underrepresented and still others (such as Blacks) were completely excluded. The subjects were all volunteers and this fact led to further charges of unrepresentativeness. Additional criticisms centered on the interview technique which relied solely on an individual’s ability to recall events in his past.

The statistical methodology used in the Kinsey report was highly praised although it was descriptive and relatively simple (Cochran, Mosteller, and Tukey, 1954). In response to the criticisms of the Kinsey report, moreover, the investigators argued that this study was just a pilot study for a much larger sexual attitude survey. Nevertheless, in numerous instances the conclusions drawn from the study went beyond bounds that could be substantiated by the data. Actually, the conclusions are quite limited in generality. The two examples just discussed demonstrate the problems that can arise from the absence of an adequately representative data base. Regardless of the sophistication of statistical analyses of the data, deficiencies in the data base can preclude valid conclusions. In particular, interpreting fitted regression models and comparing estimated model parameters in a regression analysis can lead to erroneous inferences if problems with the data go undetected or are ignored.

1.1.2 Data-Conditioned Inferences

Of particular relevance to a discussion of data-collection problems is the nature of the inferences that can be drawn once the data are collected. Data bases are generally compiled to be representative of a wide range of conditions but they can fail to be as representative as intended even when good data-collection techniques are employed. One can be led to believe that broad generalizations from the data are possible because of a good data-collection effort when a closer inspection of the data might reveal that deficiencies exist in the data base.

Equality o...

Cover
Half Title
Title Page
Copyright Page
Dedication
Table of Contents
1. INTRODUCTION
2. INITIAL DATA EXPLORATION
3. SINGLE-VARIABLE LEAST SQUARES
4. MULTIPLE-VARIABLE PRELIMINARIES
5. MULTIPLE-VARIABLE LEAST SQUARES
6. INFERENCE
7. RESIDUAL ANALYSIS
8. VARIABLE SELECTION TECHNIQUES
9. MULTICOLLINEARITY EFFECTS
10. BIASED REGRESSION ESTIMATORS
APPENDIX A. DATA SETS ANALYZED IN THIS TEXT
APPENDIX B. DATA SETS FOR FURTHER STUDY
APPENDIX C. STATISTICAL TABLES
BIBLIOGRAPHY
INDEX

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud

Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app

Yes, you can access Regression Analysis and its Application by Richard F. Gunst,Robert L. Mason in PDF and/or ePUB format, as well as other popular books in Mathematics & Mathematics General. We have over one million books available in our catalogue for you to explore.

About this book

Tools to learn more effectively

Information

Table of contents

Frequently asked questions