eBook - ePub

Statistical Rules of Thumb

Name: Statistical Rules of Thumb
Author: Gerald van Belle

Gerald van Belle,

English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Statistical Rules of Thumb

Gerald van Belle,

Book details

Book preview

Table of contents

Citations

About This Book

Praise for the First Edition:

"For a beginner [this book] is a treasure trove; for an experienced person it can provide new ideas on how better to pursue the subject of applied statistics."
— Journal of Quality Technology

Sensibly organized for quick reference, Statistical Rules of Thumb, Second Edition compiles simple rules that are widely applicable, robust, and elegant, and each captures key statistical concepts. This unique guide to the use of statistics for designing, conducting, and analyzing research studies illustrates real-world statistical applications through examples from fields such as public health and environmental studies. Along with an insightful discussion of the reasoning behind every technique, this easy-to-use handbook also conveys the various possibilities statisticians must think of when designing and conducting a study or analyzing its data.

Each chapter presents clearly defined rules related to inference, covariation, experimental design, consultation, and data representation, and each rule is organized and discussed under five succinct headings: introduction; statement and illustration of the rule; the derivation of the rule; a concluding discussion; and exploration of the concept's extensions. The author also introduces new rules of thumb for topics such as sample size for ratio analysis, absolute and relative risk, ANCOVA cautions, and dichotomization of continuous variables. Additional features of the Second Edition include:

Additional rules on Bayesian topics
New chapters on observational studies and Evidence-Based Medicine (EBM)
Additional emphasis on variation and causation
Updated material with new references, examples, and sources

A related Web site provides a rich learning environment and contains additional rules, presentations by the author, and a message board where readers can share their own strategies and discoveries. Statistical Rules of Thumb, Second Edition is an ideal supplementary book for courses in experimental design and survey research methods at the upper-undergraduate and graduate levels. It also serves as an indispensable reference for statisticians, researchers, consultants, and scientists who would like to develop an understanding of the statistical foundations of their research efforts. A related website www.vanbelle.org provides additional rules, author presentations and more.

Frequently asked questions

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Yes, you can access Statistical Rules of Thumb by Gerald van Belle in PDF and/or ePUB format, as well as other popular books in Mathematics & Probability & Statistics. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Wiley-Interscience

Year

2011

ISBN

9781118210369

Edition

Topic

Mathematics

Subtopic

Probability & Statistics

Index

Mathematics

The Basics

This chapter discusses some fundamental statistical issues dealing with variation, statistical models, calculations of probability and the connection between hypothesis testing and estimation. These are basic topics that need to be understood by statistical consultants and those who use statistical methods. The selection of these topics reflects the author’s experience and practice.

There would be no need for statistical methods if there were no variation or variety. Variety is more than the spice of life; it is the bread and butter of statisticians and their expertise. Assessing, describing and sorting variation is a key statistical activity. But not all variation is the domain of statistical practice, it is restricted to variation that has an element of randomness to it.

Definitions of the field of statistics abound. See a sampling in van Belle et al. (2004). For purposes of this book the following characteristics, based on a description by R.A. Fisher (1935) will be used. Statistics is the study of populations, variation, and methods of data reduction. He points out that “the same types of problems arise in every case.” For example, a population implies variation and a population cannot be wholly ascertained so descriptions of the population depend on sampling. The samples need to be reduced to summarize information about the population and this is a problem in data reduction.

1.1 FOUR BASIC QUESTIONS

Introduction

R.A. Fisher’s definitions provide a formal basis for statistics. It presupposes a great deal that needs to be made explicit. For the researcher and the statistical colleague there is a broader program that puts the Fisher material in context.

Rule of Thumb

Any statistical treatment must address the following questions:

1. What is the question?

2. Can it be measured?

3. Where, when, and how will you get the data?

4. What do you think the data are telling you?

Illustration

Consider the question, “does air pollution cause ill health”? This is a very broad question that was qualitatively answered with the London smog episodes of the 1940s and 1950s. Lave and Seskin (1970), among others, tried to assess the quantitative effect and this question still with us today. That raises the non-trivial questions whether “air pollution” and “ill health” can be measured. Lave and Seskin review measures of the former such as sulfur in the air and suspended particulates. In the latter category they list morbidity and mortality. The third question of data collection was addressed by considering data from 114 Standard Metropolitan Statistical Areas in the U.S. which contained health information and other government sources for pollution information. The fourth question was answered by running multiple regressions controlling for a variety of factors that might confound the effect, for example age and socioeconomic status.

A host of questions can be raised but in the end this was a landmark study that anticipated and still guides research efforts today.

Basis of the Rule

The rule essentially mimics the scientific method with particular emphasis on the role of data collection and analysis.

Discussion and Extensions

The first question usually deals with broad scientific issues which often have policy and regulatory implications. Another example is global warming and its cause(s). But not all questions are measurable, for example, how do we measure human happiness or wisdom? In fact, most of the important questions of life are not measurable (a reason for humility). “Measurability” implies that there are “endpoints” which address the basic question. Frequently we need to take a short-cut. For example, income as a summary of socio-economic status. Given measurable values of the question we can then test whether one set of values differs from another. So testability implies measurability.

This raises the question whether the difference in the endpoints reflect an important difference in the first question. An example of this kind of question is the difference between statistical significance and clinical significance (It may be better to say clinical re/ewmce-statistical significance may point to a very important mechanistic framework). In this context there also needs to be careful considerations of measurements that are not taken. This issue will be addressed in more detail in the chapter on observational studies.

If it is agreed that the question is measurable the issue of data selection or data creation comes up. The three subquestions that focus the discussion. They locate the data selection in space and time, and context. The data can range from administrative data bases to experimental data, they can be retrospective or prospective. The “how” subquestion deals with the process that will actually be used. If sampling is involved, the sampling mechanism must be carefully described. In studies involving animals and humans this, especially, requires careful attention to ethics (but not restricted to these, of course). Broadly speaking there are two approaches to getting the data: observational studies and designed experiments.

The next step is analysis and interpretation of the data which, it is hoped, answers questions 1 and 2. Questions 1–3 focus on design, ranging from collecting anecdotes to doing a survey sample to conducting a randomized experiment. Question 4 focuses on analysis-in which statisticians have developed particular expertise (and sometimes ignore questions 1–3 by saying. “Let X be a random variable...”). But is is clear that the answers to the questions are inextricable interrelated. Other issues implied by the question include the statistical model that is used, the robustness of the model, missing data, and an assessment of the many sources of variability.

The ordering reflects the process of science. Data miners who address only question 4 do so at their own risk.

1.2 OBSERVATION IS SELECTION

Introduction

The title of this rule is from Whitehead (1925)-so the idea is not new. This is perhaps the most obvious of rules; and is not taken into account the majority of the time.

Rule of Thumb

Observation is selection.

Illustration

The observation may be straightforward but the selection process not. An example that should be better known (selection?) is the vulnerability analysis of planes returning from bombing missions during Word War II. Aircraft returning from missions had been hit in various places. The challenge was to determine which parts of the plane to reinforce to decrease their vulnerability. The naive approach started with figuring out where the hits had occurred. A second and improved approach was to adjust the number of hits by the area of the plane. The third step recommended by the statistician Abraham Wald: reinforce the planes where they had not been hit! His point was that the observations were correct, but not the selection process. What was of primary interest were the planes that did not return. Using an insightful statistical model he showed that the engine area (showing the fewest hits in returning planes) was the most vulnerable. This is one of those aha! situations where we immediately grasp the key role of the selection process. See Mangel and Samaniego (1984) for the technical description and references.

Basis of the Rule

To observe one thing implies that another is not observed, hence there is selection. This implies that the observation is taken from a larger collective, the statistical “population.”

Discussion and Extensions

Often the observation is of interest only in so far as it is representative of the population we are interested in. For example, in the vulnerability analysis, the plane that provided the information about hits might not have been used again and scrapped for parts.

Selection can be subconscious as when we notice Volvo cars everywhere after having bought one. Thus it is important to be able to recognize the selection process.

The selection process in humans is very complicated as evidenced by contradictory evidence by witnesses of the same accident. Nisbett and Ross (1980) and Kahneman, Slovic, and Tversky (1982) describe in detail some of the heuristics we use in selecting information. The bottom line is “know your sample.”

1.3 REPLICATE TO CHARACTERIZE VARIABILITY

Introduction

A fundamental challenge of statistics is to characterize the variation that we observe. We can distinguish between systematic variation and non-systematic variation which sometimes can be characterized as random variation. An example of systematic variation is the mile-marker on the highway or the kilometer-marker on the autobahn.

This kind of variation is predictable. Random variation cannot be described in this way. In this section we are concerned with random variation.

Rule of Thumb

Replicate to characterize random variation.

Illustration

Repeated sampling under constant conditions tends to produce replicate observations. For example, the planes in the previous illustration have the potential of being consider replicate observations. The reason for the careful wording is that many assumptions need to be made such as the planes have not been altered in some way that affects their vulnerability, the enemy has not changed strategy, and so on.

At a more mundane level, the baby aspirin tablets we take are about as close to replicates as we can imagine. But even here, there are storage requirements and expiration dates that may make the replications invalid.

Basis of the Rule

Characterizing variability requires repeatedly observing the variability since the it is not a property inherent in the observation itself.

Discussion and Extensions

The concept of replication is intuitive but difficult to define precisely. The idea of constant conditions is technically impossible to achieve since time marches on. Marriott (1999), defines replication as “execution of an experiment or survey more than once so as to increase precision and to obtain a closer estimation of sampling error.” He also makes a distinction between replication and repetition, reserving the former for repetition at the same time and place.

In agricultural research the basic replicate is called a plot. Treatments can be compared by using several plots to each treatment so that the variability within a treatment is replicate variability.

There is one method that ensures replication: randomization of observational units to two or more treatments. More will be said about in the chapter on design.

1.4 VARIABILITY OCCURS AT MULTIPLE LEVELS

Introduction

As soon as the concept of variability is grasped it becomes clear that there are many sources of variability. Again, here the sources may be systematic or random. The emphasis here is, again, on random variability.

Rule of Thumb

Variability occurs at multiple levels.

Example

In education there is clearly variation in talents from student to student, from classroom to classroom, from school to school, from district to district, from country to country. In this example there is a hierarchy with students nested within schools and so on.

Basis of the Rule

The basis of the rule is the recognition that there are levels of variation.

Discussion and Extensions

Each level of an observational hierarchy has its own units and its own variation. Suppose that the variable is expenditure per student. This could be expanded to expenditure per classroom, school or district. In order to standardize the expenditure per student could be used but for other purposes it may be useful to compare expenditure at the district level. However, if district are compared then the number of students served is usually considered. The number of students would be a confounder in comparison of districts. More will be said about confounders in Chapter 3.

1.5 INVALID SELECTION IS THE PRIMARY THREAT TO VALID INFERENCE

Introduction

The challenge is to be able to describe the selection process-a fundamental problem for applied statisticians. Selection bias occurs when the sample is not representative of the population of interest; this usually occurs when the sampling is not random. For example, a telephone survey of voters excludes those without telephones. This becomes important when the survey deals with political affiliation which may also be associated with owning a telephone (as a proxy for socio-economic status).

The selection process need not be simple random sampling, all that is required that in the end the probability of selection of the units of interest can be described. Survey sampling is a good example of a field where very clever selection processes are used in order to minimize sampling effort and cost yet have estimable probabilities of selection.

1.6 THERE IS VARIATION IN STRENGTH OF INFERENCE

Introduction

Every one agrees that there are degrees of quality of information but when asked to define the criteria there a great deal of disagreement. The simple statistical rule that the inverse of the variance of a statistic is a measure of the information contained in the statistic provides a useful criterion for a point estimate but is clearly inadequate for comparing much bigger chunks of information such as a study. In the field of history primary sources are deemed more informative than secondary sources. These, and other, considerations point to the need to scale the quality and robustness of information.

Rule of Thumb

Compared with experimental studies, observational studies provide less robust information.

Illustration

The Women’s Health Initiative (WHI) (see chapter 7, a large study involving both randomized clinical trials and parallel observational studies uses the randomized clinical trial to evaluate the validity the evidence of the observational component. In fact, the goal of the analysis of the observational arm is to come as close as possible to the results of the randomized trials.

Basis of the Rule

The primary reason for less robust inference of observational study is that the probability framework linking the study to the population of inference is unknown.

Discussion and Extensions

A great deal more will be said above the strength of inference in the chapter on evidence-base medicine. Also, the Hill guidelines to be discussed below provide at least a rough guide for determining the strength of evidence.

Each field of research has its own criteria for strength of evidence. In genetics the strength of evidence is measured by the lod score is defined as the log (base 10) of the probability of occurrence of the event given a hypothesized linkage to the probability assuming no linkage (log base 10 of the odds ratio). Lod scores of three or more are considered confirmatory, a lod score of —2 as disproving a claimed association. These are stringent criteria. A lod score of 3 means that the probability of the event is 1000 times under the alternative hypothesis (e.g. linkage) than under the null hypothesis (non linkage), this implies a p-value of 0.001.

In epidemiology, odds ratios of two or greater are looked at with more interest that smaller odds ratios. In a very different area, historiography, primary sources are considered more reliable than secondary sources. In each case there are qualifications-usually by experts who know the field thoroughly.

There is no explicit random mechanism in many observational studies. For example, there may not be randomization in linkage analysis. However, there will be an assumption of statistical independence which together with independent observations produces a situation equivalent to randomization. These underlying assumptions then need to be tested or evaluated. In such cases evaluation of the data is often done vi...

Cover Page
Series Page
Title Page
Copyright page
Acknowledgments
Foreward
Preface to the Second Edition
Preface to the First Edition
Acronyms
1: The Basics
2: Sample Size
3: Observational Studies
4: Covariation
5: Environmental Studies
6: Epidemiology
7: Evidence-Based Medicine
8: Design, Conduct, and nalysis
9: Words, Tables, and Graphs
10: Consulting
Epilogue
References
Author Index
Topic Index