Regression Analysis and its Application
eBook - ePub

Regression Analysis and its Application

A Data-Oriented Approach

  1. 424 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Regression Analysis and its Application

A Data-Oriented Approach

Book details
Book preview
Table of contents
Citations

About This Book

Regression Analysis and Its Application: A Data-Oriented Approach answers the need for researchers and students who would like a better understanding of classical regression analysis. Useful either as a textbook or as a reference source, this book bridges the gap between the purely theoretical coverage of regression analysis and its practical application.

The book presents regression analysis in the general context of data analysis. Using a teach-by-example format, it contains ten major data sets along with several smaller ones to illustrate the common characteristics of regression data and properties of statistics that are employed in regression analysis. The book covers model misspecification, residual analysis, multicollinearity, and biased regression estimators. It also focuses on data collection, model assumptions, and the interpretation of parameter estimates.

Complete with an extensive bibliography, Regression Analysis and Its Application is suitable for statisticians, graduate and upper-level undergraduate students, and research scientists in biometry, business, ecology, economics, education, engineering, mathematics, physical sciences, psychology, and sociology. In addition, data collection agencies in the government and private sector will benefit from the book.

Frequently asked questions

Simply head over to the account section in settings and click on ā€œCancel Subscriptionā€ - itā€™s as simple as that. After you cancel, your membership will stay active for the remainder of the time youā€™ve paid for. Learn more here.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Both plans give you full access to the library and all of Perlegoā€™s features. The only differences are the price and subscription period: With the annual plan youā€™ll save around 30% compared to 12 months on the monthly plan.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, weā€™ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes, you can access Regression Analysis and its Application by Richard F. Gunst, Robert L. Mason in PDF and/or ePUB format, as well as other popular books in Mathematics & Number Theory. We have over one million books available in our catalogue for you to explore.

Information

Publisher
CRC Press
Year
2018
ISBN
9781351419291
Edition
1
CHAPTER 1
INTRODUCTION
Data analysis of any kind, including a regression analysis, has the potential for far-reaching consequences. Conclusions drawn from small laboratory experiments or extensive sample surveys might only influence oneā€™s colleagues and associates or they could form the basis for policy decisions by governmental agencies which could conceivably affect millions of people. Data analysts must, therefore, have an adequate knowledge of and a healthy respect for the procedures they utilize.
Consider as an illustration of the potential for far-reaching effects of a data analysis one of the most massive research projects ever undertaken, the Salk polio vaccine trials (Meier, 1972). The conclusions drawn from the results of this study ultimately culminated in a nationwide polio immunization program and virtual elimination of this tragic disease in the United States. The foresight and competence of the principal investigators of the study prevented ambiguity of the results and possible criticism of the conclusions. The handling of this experiment provides valuable lessons in the overall role of data analysis and the care with which it must be approached.
Polio in the early 1950ā€™s was a mysterious disease. No one could predict where or when it would strike. It did not affect a large segment of any community but those it did strike, mostly children, were often left paralyzed. Its crippling effect on young children and the sporadic nature of its occurrence led to demands for a major effort in eradicating the disease. Salkā€™s vaccine was one of the most promising ones available, but it had not been sufficiently tested.
Since the occurrence of polio in any specific community could not be predicted and only a small portion of the population actually contracted the disease in any year, a large-scale experiment including many communities was necessitated. In the end over one million children participated in the study, some receiving the vaccine and others just a placebo.
In allowing their children to participate, many parents insisted on knowing whether their child received the vaccine or the placebo. These children constituted the ā€œobserved-placeboā€ group (Meier, 1972). The planners of the experiment, realizing potential difficulties in the interpretation of the results, insisted that there be a large number of communities for which neither child, parent, nor diagnosing physician knew whether the child received the vaccine or the placebo. This group of children made up the ā€œplacebo-controlā€ group.
For both groups of children the incidence of polio was lower for those vaccinated than for those who were not vaccinated. The conclusion was unequivocal: the Salk vaccine proved effective in preventing polio. This conclusion would have been compromised, however, had the planners of the study not insisted that the placebo-control group be included. Doubts that the observed-placebo group could reliably indicate the effectiveness of the vaccine were raised both before and after the experiment. The indicators of polio are so similar to those of some other diseases that the diagnosing physician might tend to diagnose polio if he knew the child had not been vaccinated and diagnose one of the other diseases if he knew the child had been vaccinated. After the experiment was conducted, analysis of the data for the observed-control group indicated that the vaccine was effective but the differences were not large enough to prevent charges of (unintentional) physician bias. Differences in the incidence of polio between vaccinated and nonvaccinated children in the placebo-control group were larger than those in the observed-control group and the analysis of this data provided the definitive conclusion. Thus due to the careful planning and execution of this study, including the data collection and analysis, the immunization program that was later implemented has resulted in almost complete eradication of polio in the United States.
1.1 DATA COLLECTION
Data can be compiled in a variety of ways. For specific types of information, the U. S. Bureau of the Census can rely on nearly complete enumerations of the U. S. population or on data collected using sophisticated sample survey designs. The Bureau of the Census can insure that all segments of the population are represented in most of the analyses that they desire to perform. Many research endeavors, however, are conducted on a relatively smaller scale and are limited by time, manpower, or economics. Characteristic of these studies is a data base that is restricted by the data-collection techniques.
So important is the data base to a regression analysis that we begin our development of multiple linear regression with the data-collection phase. The emphasis of this section is on an understanding of the benefits associated with a good data collection effort and the influence on the interpretation of fitted models when the data base is restricted. While it may not always be possible to build a data base as large or as representative as one might desire, knowledge of the limitations of a data base can prevent many incorrect applications of regression methodology.
1.1.1 Data-Base Limitations
Regression analysis provides information on relationships between a response variable and one or more predictor variables but only to the degree that such information is contained in the data base. Whether the data are compiled from a complete enumeration of a population, an appropriate sample survey, a haphazard tabulation, or by simply inventing data, regression coefficients can be estimated and conclusions can be drawn from the fitted model. The quality of the fit and accuracy of conclusions, however, depend on the data used: data that are not representative or not properly compiled can result in poor fits and erroneous conclusions.
One of many studies that illustrates the problems that arise when one is forced to draw inferences from a potentially nonrepresentative sample is found in Crane (1965). In her attempt to assess the influence of graduate school prestige and current academic affiliation on productivity and peer recognition of university professors, she surveyed faculty members in three disciplines from three universities on the east coast of the United States. The responses were voluntary and presumably not all professors in these disciplines participated in the study. Although Craneā€™s study did not call for a regression analysis, the interpretation problems that occur as a result of her data-collection effort are applicable regardless of the type of analysis performed.
Questions naturally arise concerning any conclusions that would be drawn from a study with the data-base limitations of this one. Do these three disciplines truly represent all academic disciplines? Can these three universities be said to be typical of all universities in the United States? If some professors chose not to participate in the study, are the responses thereby biased? These questions cannot be answered from Craneā€™s data. Only if additional studies provide results similar to hers for other disciplines and other schools can global conclusions be drawn concerning the influence of graduate school and current academic affiliation on recognition and productivity of university professors. No amount of statistical analysis can compensate for these data-base limitations.
Criticisms of limited data bases and disagreements with conclusions drawn from the analysis of them are common. Nevertheless, the choice is often between conducting no investigations at all or analyzing restricted sets of data. We do not advocate the former position; however, it is the obligation of the data analyst to investigate the data-collection process, discover any limitations in the data collected, and restrict conclusions accordingly. Another example will stress these points and the consequences of underrating their importance.
A well-publicized study on male sexuality (Kinsey et al., 1948) evoked widespread criticism both because of its controversial subject matter and because of its data-collection procedures. Responses were solicited from males belonging to a large number of groups in order to make the sampling more feasible. About 5,300 males were interviewed in prisons, mental institutions, rooming houses, etc. By interviewing volunteers from groups such as this, a large sample of responses could be obtained without exhaustive effort and expense. The convenience of selecting responses in this fashion is the primary factor contributing to the debate over the results of the study.
Among the criticisms raised about the Kinsey report, most centered on the data-collection process. Some groups (such as college men) were overrepresented while others (such as Catholics) were underrepresented and still others (such as Blacks) were completely excluded. The subjects were all volunteers and this fact led to further charges of unrepresentativeness. Additional criticisms centered on the interview technique which relied solely on an individualā€™s ability to recall events in his past.
The statistical methodology used in the Kinsey report was highly praised although it was descriptive and relatively simple (Cochran, Mosteller, and Tukey, 1954). In response to the criticisms of the Kinsey report, moreover, the investigators argued that this study was just a pilot study for a much larger sexual attitude survey. Nevertheless, in numerous instances the conclusions drawn from the study went beyond bounds that could be substantiated by the data. Actually, the conclusions are quite limited in generality. The two examples just discussed demonstrate the problems that can arise from the absence of an adequately representative data base. Regardless of the sophistication of statistical analyses of the data, deficiencies in the data base can preclude valid conclusions. In particular, interpreting fitted regression models and comparing estimated model parameters in a regression analysis can lead to erroneous inferences if problems with the data go undetected or are ignored.
1.1.2 Data-Conditioned Inferences
Of particular relevance to a discussion of data-collection problems is the nature of the inferences that can be drawn once the data are collected. Data bases are generally compiled to be representative of a wide range of conditions but they can fail to be as representative as intended even when good data-collection techniques are employed. One can be led to believe that broad generalizations from the data are possible because of a good data-collection effort when a closer inspection of the data might reveal that deficiencies exist in the data base.
Equality o...

Table of contents

  1. Cover
  2. Half Title
  3. Title Page
  4. Copyright Page
  5. Dedication
  6. Table of Contents
  7. 1. INTRODUCTION
  8. 2. INITIAL DATA EXPLORATION
  9. 3. SINGLE-VARIABLE LEAST SQUARES
  10. 4. MULTIPLE-VARIABLE PRELIMINARIES
  11. 5. MULTIPLE-VARIABLE LEAST SQUARES
  12. 6. INFERENCE
  13. 7. RESIDUAL ANALYSIS
  14. 8. VARIABLE SELECTION TECHNIQUES
  15. 9. MULTICOLLINEARITY EFFECTS
  16. 10.Ā Ā BIASED REGRESSION ESTIMATORS
  17. APPENDIX A. DATA SETS ANALYZED IN THIS TEXT
  18. APPENDIX B. DATA SETS FOR FURTHER STUDY
  19. APPENDIX C. STATISTICAL TABLES
  20. BIBLIOGRAPHY
  21. INDEX