A Primer of Multivariate Statistics
eBook - ePub

A Primer of Multivariate Statistics

  1. 634 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

A Primer of Multivariate Statistics

Book details
Book preview
Table of contents
Citations

About This Book

Drawing upon more than 30 years of experience in working with statistics, Dr. Richard J. Harris has updated A Primer of Multivariate Statistics to provide a model of balance between how-to and why. This classic text covers multivariate techniques with a taste of latent variable approaches. Throughout the book there is a focus on the importance of describing and testing one's interpretations of the emergent variables that are produced by multivariate analysis. This edition retains its conversational writing style while focusing on classical techniques. The book gives the reader a feel for why one should consider diving into more detailed treatments of computer-modeling and latent-variable techniques, such as non-recursive path analysis, confirmatory factor analysis, and hierarchical linear modeling. Throughout the book there is a focus on the importance of describing and testing one's interpretations of the emergent variables that are produced by multivariate analysis.

Frequently asked questions

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes, you can access A Primer of Multivariate Statistics by Richard J. Harris in PDF and/or ePUB format, as well as other popular books in Psychology & History & Theory in Psychology. We have over one million books available in our catalogue for you to explore.

Information

Year
2001
ISBN
9781135555436
Edition
3

1
The Forest before the Trees

1.0 Why Statistics?

This text and its author subscribe to the importance of sensitivity to data and of the wedding of humanitarian impulse to scientific rigor. Therefore, it seems appropriate to discuss my conception of the role of statistics in the overall research process. This section assumes familiarity with the general principles of research methodology. It also assumes some acquaintance with the use of statistics, especially significance tests, in research. If this latter is a poor assumption, the reader is urged to delay reading this section until after reading Section 1.2.

1.0.1 Statistics as a Form of Social Control

Statistics is a form of social control over the professional behavior of researchers. The ultimate justification for any statistical procedure lies in the kinds of research behavior it encourages or discourages. In their descriptive applications, statistical procedures provide a set of tools for efficiently summarizing the researcher's empirical findings in a form that is more readily assimilated by the intended audience than would be a simple listing of the raw data. The availability and apparent utility of these procedures generate pressure on researchers to employ them in reporting their results, rather than relying on a more discursive approach. On the other hand, most statistics summarize only certain aspects of the data; consequently, automatic (e.g., computerized) computation of standard (cookbook?) statistics without the intermediate step of "living with" the data in all of its concrete detail may lead to overlooking important features of these data. A number of authors (see especially Anscombe, 1973, and Tukey, 1977) offered suggestions for preliminary screening of the data so as to ensure that the summary statistics finally selected are truly relevant to the data at hand.
The inferential applications of statistics provide protection against the universal tendency to confuse aspects of the data that are unique to the particular sample of subjects, stimuli, and conditions involved in a study with the general properties of the population from which these subjects, stimuli, and conditions were sampled. For instance, it often proves difficult to convince a subject who has just been through a binary prediction experiment involving, say, predicting which of two lights will be turned on in each of several trials that the experimenter used a random-number table in selecting the sequence of events. Among researchers, this tendency expresses itself as a proneness to generate complex post hoc explanations of their results that must be constantly revised because they are based in part on aspects of the data that are highly unstable from one replication of the study to the next. Social control is obtained over this tendency, and the"garbage rate" for published studies is reduced, by requiring that experimenters first demonstrate that their results cannot be plausibly explained by the null hypothesis of no true relationship in the population between their independent and dependent variables. Only after this has been established are experimenters permitted to foist on their colleagues more complex explanations. The scientific community generally accepts this control over their behavior because
  1. Bitter experience with reliance on investigators' informal assessment of the generalizability of their results has shown that some formal system of "screening" data is needed.
  2. The particular procedure just (crudely) described, which we may label the null hypothesis significance testing (NHST) procedure, has the backing of a highly developed mathematical model. If certain plausible assumptions are met, this model provides rather good quantitative estimates of the relative frequency with which we will falsely reject (Type I error) or mistakenly fail to reject (Type II error) the null hypothesis. Assuming again that the assumptions have been met, this model also provides clear rules concerning how to adjust both our criteria for rejection and the conditions of our experiment (such as number of subjects) so as to set these two "error rates" at prespecified levels.
  3. The null hypothesis significance testing procedure is usually not a particularly irksome one, thanks to the ready availability of formulae, tables, and computer programs to aid in carrying out the testing procedure for a broad class of research situations.

1.0.2 Objections to Null Hypothesis Significance Testing

However, acceptance is not uniform. Bayesian statisticians, for instance, point out that the mathematical model underlying the null hypothesis significance testing procedure fits the behavior and beliefs of researchers quite poorly. No one, for example, seriously entertains the null hypothesis, because almost any treatment or background variable will have some systematic (although possibly miniscule) effect. Similarly, no scientist accepts or rejects a conceptual hypothesis on the basis of a single study. Instead, the scientist withholds final judgment until a given phenomenon has been replicated on a variety of studies. Bayesian approaches to statistics thus picture the researcher as beginning each study with some degree of confidence in a particular hypothesis and then revising this confidence in (the subjective probability of) the hypothesis up or down, depending on the outcome of the study. This is almost certainly a more realistic description of research behavior than that provided by the null hypothesis testing model. However, the superiority of the Bayesian approach as a descriptive theory of research behavior does not necessarily make it a better prescriptive (normative) theory than the null hypothesis testing model. Bayesian approaches are not nearly as well developed as are null hypothesis testing procedures, and they demand more from the user in terms of mathematical sophistication. They also demand more in terms of ability to specify the nature of the researcher's subjective beliefs concerning the hypotheses about which the study is designed to provide evidence. Further, this dependence of the result of Bayesian analyses on the investigator's subjective beliefs means that Bayesian "conclusions" may vary among different investigators examining precisely the same data. Consequently, the mathematical and computational effort expended by the researcher in performing a Bayesian analysis may be relatively useless to those of his or her readers who hold different prior subjective beliefs about the phenomenon. (The mays in the preceding sentence derive from the fact that many Bayesian procedures are robust across a wide range of prior beliefs.) For these reasons, Bayesian approaches are not employed in the Primer. Press (1972) has incorporated Bayesian approaches wherever possible.
An increasingly "popular" objection to null hypothesis testing centers around the contention that these procedures have become too readily available, thereby seducing researchers and journal editors into allowing the tail (the inferential aspect of statistics) to wag the dog (the research process considered as a whole). Many statisticians have appealed for one or more of the following reforms in the null hypothesis testing procedure:
  1. Heavier emphasis should be placed on the descriptive aspects of statistics, including, as a minimum, the careful examination of the individual data points before, after, during, or possibly instead of "cookbook" statistical procedures to them.
  2. The research question should dictate the appropriate statistical analysis, rather than letting the ready availability of a statistical technique generate a search for research paradigms that fit the assumptions of that technique.
  3. Statistical procedures that are less dependent on distributional and sampling assumptions, such as randomization tests (which compute the probability that a completely random reassignment of observations to groups would produce as large an apparent discrepancy from the null hypothesis as would sorting scores on the basis of the treatment or classification actually received by the subject) or jackknifing tests (which are based on the stability of the results under random deletion of portions of the data), should be developed. These procedures have only recently become viable as high-speed computers have become readily available.
  4. Our training of behavioral scientists (and our own practice) should place more emphasis on the hypothesis-generating phase of research, including the use of post hoc examination of the data gathered while testing one hypothesis as a stimulus to theory revision or origination. Kendall (1968), Mosteller and Tukey (1968), Anscombe (1973), and McGuire (1973) can serve to introduce the reader to this "protest literature."

1.0.3 Should Significance Tests be Banned?

Concern about abuses of null hypothesis significance testing reached a peak in the late 1990s with a proposal to the American Psychological Association (APA) that null hypothesis significance tests (NHSTs) be banned from APA journals. A committee was in fact appointed to address this issue, but its deliberations and subsequent report were quickly broadened to a set of general recommendations for data analysis, framed as specific suggestions for revisions of the data-analysis sections of the APA publication manual—not including a ban on NHSTs (Wilkinson and APA Task Force on Statistical Inference, 1999).
Most of the objections to NHST that emerged in this debate were actually objections to researchers' misuse and misinterpretation of the results of NHSTs—most notably, treating a nonsignificant result as establishing that the population effect size is exactly zero and treating rejection of H0 as establishing the substantive importance of the effect. These are matters of education, not of flawed logic. Both of these mistakes are much less likely (or at least are made obvious to the researcher's readers, if not to the researcher) if the significance test is accompanied by a confidence interval (CI) around the observed estimate of the population effect—and indeed a number of authors have pointed out that the absence or presence of the null-hypothesized value in the confidence interval matches perfectly (at least when a two-tailed significance test at level a is paired with a traditional, symmetric (1-α)-level CI) the statistical significance or nonsignificance of the NHST. This has led to the suggestion that NHSTs simply be replaced by CIs. My recommendation is that CIs be used to supplement, rather than to replace, NHSTs, because
  1. The p-value provides two pieces of information not provided by the corresponding CI, namely an upper bound on the probability of declaring statistical significance in the wrong direction (which is at most half of our p value; Harris, 1997a, 1997b) and an indication of the likelihood of a successful exact replication (Greenwald, Gonzalez, Harris, & Guthrie, 1996).
  2. Multiple-df overall tests, such as the traditional F for the between-groups effect in one-way analysis of variance (Anova), are a much more efficient way of determining whether there are any statistically significant patterns of differences or among the means or (in multiple regression) statistically reliable combinations o...

Table of contents

  1. Cover
  2. Half Title
  3. Title
  4. Copyright
  5. Dedication
  6. Contents
  7. 1 The Forest before the Trees
  8. 2 Multiple Regression: Predicting One Variable from Many
  9. 3 Hotelling's T2: Tests on One or Two Mean Vectors
  10. 4 Multivariate Analysis of Variance: Differences Among Several Groups on Several Measures
  11. 5 Canonical Correlation: Relationships Between Two Sets of Variables
  12. 6 Principal Component Analysis: Relationships Within a Single Set of Variables
  13. 7 Factor Analysis: The Search for Structure
  14. 8 The Forest Revisited
  15. Digression 1 Finding Maxima and Minima of Polynomials
  16. Digression 2 Matrix Algebra
  17. Digression 3 Solution of Cubic Equations
  18. Appendix A Statistical Tables
  19. Appendix B Computer Programs Available from the Author
  20. Appendix C Derivations
  21. References
  22. Index