Handbook of Employee Selection
eBook - ePub

Handbook of Employee Selection

  1. 1,005 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Handbook of Employee Selection

Book details
Book preview
Table of contents
Citations

About This Book

This second edition of the Handbook of Employee Selection has been revised and updated throughout to reflect current thinking on the state of science and practice in employee selection. In this volume, a diverse group of recognized scholars inside and outside the United States balance theory, research, and practice, often taking a global perspective.

Divided into eight parts, chapters cover issues associated with measurement, such as validity and reliability, as well as practical concerns around the development of appropriate selection procedures and implementation of selection programs. Several chapters discuss the measurement of various constructs commonly used as predictors, and other chapters confront criterion measures that are used in test validation. Additional sections include chapters that focus on ethical and legal concerns and testing for certain types of jobs (e.g., blue collar jobs). The second edition features a new section on technology and employee selection.

The Handbook of Employee Selection, Second Edition provides an indispensable reference for scholars, researchers, graduate students, and professionals in industrial and organizational psychology, human resource management, and related fields.

Frequently asked questions

Simply head over to the account section in settings and click on ā€œCancel Subscriptionā€ - itā€™s as simple as that. After you cancel, your membership will stay active for the remainder of the time youā€™ve paid for. Learn more here.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Both plans give you full access to the library and all of Perlegoā€™s features. The only differences are the price and subscription period: With the annual plan youā€™ll save around 30% compared to 12 months on the monthly plan.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, weā€™ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes, you can access Handbook of Employee Selection by James L. Farr, Nancy T. Tippins in PDF and/or ePUB format, as well as other popular books in Business & Human Resource Management. We have over one million books available in our catalogue for you to explore.

Information

Publisher
Routledge
Year
2017
ISBN
9781317426370
Edition
2

Part I

Foundations of Psychological Measurement and Evaluation Applied to Employee Selection

Benjamin Schneider, Section Editor

1

Reliability

Dan J. Putka
Reliability and validity are concepts that provide the scientific foundation upon which we construct and evaluate predictor and criterion measures of interest in personnel selection. They offer a common technical language for discussing and evaluating (a) the generalizability of scores resulting from our measures (to a population of like measures), as well as (b) the accuracy inferences we desire to make based on those scores (e.g., high scores on our predictor measure are associated with high levels of job performance; high scores on our criterion measure are associated with high levels of job performance).1 Furthermore, the literature surrounding these concepts provides a framework for scientifically sound measure development that, a priori, can enable us to increase the likelihood that scores resulting from our measures will be generalizable, and inferences we desire to make based upon them, supported.
Like personnel selection itself, the science and practice surrounding the concepts of reliability and validity continue to evolve. The evolution of reliability has centered on its evaluation and framing of ā€œmeasurement error,ā€ as its operational definition over the past century has remained focused on notions of consistency of scores across replications of a measurement procedure (Haertel, 2006; Spearman, 1904; Thorndike, 1951). The evolution of validity has been more diverseā€”with changes affecting not only its evaluation but also its very definition, as evidenced by comparing editions of the Standards for Educational and Psychological Testing produced over the past half century by the American Educational Research Association (AERA), American Psychological Association (APA), and National Council on Measurement in Education (NCME) (AERA, APA, & NCME, 2014). Relative to the evolution of reliability, the evolution of validity has been well covered in the personnel selection literature (e.g., Binning & Barrett, 1989; McPhail, 2007; Schmitt & Landy, 1993; Society for Industrial and Organizational Psychology, Inc., 2003) and will continue to be well covered in this Handbook. For this reason, this chapter will be devoted to providing an integrated, modern perspective on reliability.
In reviewing literature in preparation for this chapter, I was struck at the paucity of organizational research literature that has attempted to juxtapose and integrate perspectives on reliability of the last 50 years, with perspectives on reliability from the first half of the 20th century. Indeed, Borsboom (2006) lamented that to this day many treatments of reliability are explicitly framed or implicitly laden with assumptions based on measurement models from the early 1900s. While classical test theory (CTT) certainly has its place in treatments of reliability, framing entire treatments around it serves to ā€œtrapā€ us within the CTT paradigm (Kuhn, 1962). This makes it difficult for students of the field to compare and contrastā€”on conceptual and empirical groundsā€”perspectives offered by other measurement theories and approaches to reliability estimation. This state of affairs is highly unfortunate because perspectives on reliability and methods for its estimation have evolved greatly since Gulliksenā€™s codification of CTT in 1950, yet these advances have been slow to disseminate into personnel selection research and practice. Indeed, my review of the literature reveals what appears to be a widening gap between perspectives of reliability offered in the organizational research literature and those of the broader psychometric community (e.g., Borsboom, 2006; Raykov & Marcoulides, 2011). Couple this trend with (a) the recognized decline in the graduate instruction of statistics and measurement over the past 30 years in psychology departments (Aiken, West, & Millsap, 2008; Merenda, 2007), as well as (b) the growing availability of statistical software and estimation methods since the mid-1980s, and we have a situation where the psychometric knowledge base of new researchers and practitioners can be dated prior to exiting graduate training. Perhaps more disturbing is that the lack of dissemination of modern perspectives on reliability can easily give students of the field the impression that the area of reliability has not had many scientifically or practically useful developments since the early 1950s.
In light of the issues raised above, my aim in the first part of this chapter is to parsimoniously reframe and integrate developments in the reliability literature over the past century that reflects, to the extent of my knowledge, our modern capabilities. In laying out this discussion, I use examples from personnel selection research and practice to relate key points to situations readers may confront in their own work. Given this focus, note that several topics commonly discussed in textbook or chapter-length treatments of reliability are missing from this chapter. For example, topics such as standard errors of measurement, factors affecting the magnitude of reliability coefficients (e.g., sample heterogeneity), and applications of reliability-related data (e.g., corrections for attenuation, measure refinement) receive little or no attention here. The omission of these topics is not meant to downplay their importance to our field; rather, it just reflects the fact that fine treatments of these topics already exist in several places in the literature (e.g., Feldt & Brennan, 1989; Haertel, 2006; Nunnally, 1978). My emphasis is on complementing the existing literature, not repeating it. In place of these important topics, I focus on integrating and drawing connections among historically disparate perspectives on reliability. As noted below, such integration is essential, because the literature on reliability has become extremely fragmented.
For example, although originally introduced as a ā€œliberalizationā€ of CTT more than 40 years ago, generalizability theory is still not well integrated into textbook treatments of reliability in the organizational literature. It tends to be relegated to secondary sections that appear after the primary treatment of reliability (largely based on CTT) is introduced, not mentioned at all, or treated as if it had value in only a limited number of measurement situations faced in research and practice. Although such a statement may appear as a wholesale endorsement of generalizability theory and its associated methodology, it is not. As an example, the educational measurement literature has generally held up generalizability theory as a centerpiece of modern perspectives on reliability, but arguably, this has come at the expense of shortchanging confirmatory factor analytic (CFA)-based perspectives on reliability and how such perspectives relate to and can complement generalizability theory. Ironically, this lack of integration goes both ways, because CFA-based treatments of reliability rarely, if ever, acknowledge how generalizability theory can enrich the CFA perspective (e.g., DeShon, 1998), but rather link their discussions of reliability to CTT. Essentially, investigators faced with understanding modern perspectives on reliability are faced with a fragmented, complex literature.

Overview

This chapterā€™s treatment of reliability is organized into three main sections. The first section offers a conceptual, ā€œmodel-freeā€ definition of measurement error. In essence, starting out with such a model-free definition of error is required to help clarify some confusion that tends to crop up when one begins to frame error from the perspective of a given measurement theory and the assumptions such theories make regarding the substantive nature of error. Next I overlay this conceptual treatment of error with perspectives offered by various measurement models. Measurement models are important because they offer a set of hypotheses regarding the composition of observed scores, which, if supported, can allow us to accurately estimate reliability from a sample of data and apply those estimates to various problems (e.g., corrections for attenuation, construction of score bands). Lastly, I compare and contrast three traditions that have emerged for estimating reliability: (a) a classical tradition that arose out of work by Spearman (1904) and Brown (1910), (b) a random-effects model tradition that arose out of Fisherā€™s work with analysis of variance (ANOVA), and (c) a CFA tradition that arose out of Joreskogā€™s work on congeneric test models.

Reliability

A specification for error is central to the concept of reliability, regardless of oneā€™s theoretical perspective, but to this day the meaning of the term ā€œerrorā€ is a source of debate and confusion (Borsboom & Mellenbergh, 2002; Murphy & DeShon, 2000; Schmidt, Viswesvaran, & Ones, 2000). The sources of variance in scores that are designated as sources of error can differ as a function of (a) the inferences or assertions an investigator wishes to make regarding the scores, (b) how an investigator intends to use the scores (e.g., for relative comparison among applicants or absolute comparison of their scores to some set standard), (c) characteristics of the measurement procedure that produced them, and (d) the nature of the construct one is attempting to measure. Consequently, what is called error, even for scores produced by the same measurement procedure, may legitimately reflect different things under different circumstances. As such, there is no such thing as the reliability of scores (just as there is no such thing as the validity of scores), and it is possible for many reliability estimates to be calculated that depend on how error is being defined by an investigator. Just as if we qualify statements of validity, with statements of ā€œvalidity for purpose Xā€ or ā€œevidence of validity for supporting inference X,ā€ so too must care be taken when discussing reliability with statements such as ā€œscores are reliable with respect to consistency across Y,ā€ where Y might refer to items, raters, tasks, or testing occasions, or combinations of them (Putka & Hoffman, 2013, 2015). As weā€™ll see later, different reliability estimates calculated on the same data tell us very different things about the quality of our scores and the degree to which various inferences regarding their consistency are warranted.
A convenient way to start to address these points is to examine how error has come to be operationally defined in the context of estimating reliability. All measurement theories seem to agree that reliability estimation attempts to quantify the expected degree of consistency in scores over replications of a measurement procedure (Brennan, 2001a; Haertel, 2006). Consequently, from the perspective of reliability estimation, error reflects the expected degree of inconsistency between scores produced by a measurement procedure and replications of it. Several elements of these operational definitions warrant further explanation, beginning with the notion of replication. Clarifying these elements will provide an important foundation for the remainder of this chapter.

Replication

From a measurement perspective, replication refers to the repetition or reproduction of a measurement procedure such that the scores produced by each ā€œreplicateā€ are believed to assess the same construct.2 There are many ways of replicating a measurement procedure. Perhaps the most straightforward way would be to administer the same measurement procedure on more than one occasion, which would provide insight into how consistent scores are for a given person across occasions. However, we are frequently interested in more than whether our measurement procedure would produce comparable scores on different occasions. For example, would we achieve consistency over replicates if we had used an alternative, yet similar, set of items to those that comprise our measure? Answering the latter question is a bit more difficult in that we are rarely in a position to replicate an entire measurement procedure (e.g., construct two or more 20-item measures of conscientiousness and compare scores on each). Consequently, in practice, ā€œpartsā€ or ā€œelementsā€ of our measurement procedure (e.g., items) are often viewed as replicates of each other. The observed consistency of scores across these individual elements is then used to make inferences about the level of consistency we would expect if our entire measurement procedure was replicated; that is, how consistent would we expect scores to be for a given person across alternative sets of items we might use to assess the construct of interest. The forms of replication described above dominated measurement theory for nearly the first five decades of the 20th century (Cronbach, 1947; Gulliksen, 1950).
Modern perspectives on reliability have liberalized the notion of replicates in terms of (a) the forms that they take and (b) how the measurement facets (i.e., items, raters, tasks, occasisons) that define them are manifested in a data collection design (i.e., a measurement design). For example, consider a measurement procedure that involves having two raters provide ratings for individuals with regard to their performance on three tasks designed to assess the same construct. In this case, replicates take the form of the six rater-task pairs that comprise the measurement procedure, and as such, are multifaceted (i.e., each replicate is defined in terms of specific rater and a specific task). Prior to the 1960s, measurement theory primarily focused on replicates that were defined along a single facet (e.g., replicates represented different items, different split-halves of a test, or the same test administered on different occasions).3 Early measurement models were not concerned with replicates that were multifaceted in nature (Brown, 1910; Gulliksen, 1950; Spearman, 1910). Modern perspectives on reliability also recognize that measurement facets can manifest themselves differently in any given data collection design. For example, (a) the same raters might provide ratings for each ratee; (b) a unique, nonoverlapping set of raters might provide ratings for each ratee; or (c) sets of raters that rate each ratee may vary in their degree of overlap. As noted later, the data collection design underlying oneā€™s measurement procedure has important implications for reliability estimation, which, prior to the 1960s, was not integrated into measurement models. It was simply not the focus of early measurement theory (Cronbach & Shavelson, 2004).

Expectation

A second key element of the operational definition of reliability offered above is the notion of expectation. The purpose of estimating reliability is not to quantify the level of consistency in scores among the sample of replicates that comprise oneā€™s measurement procedure for a given study (e.g., items, raters, tasks, or combinations thereof). Rather, the purpose is to use such information to make inferences regarding (a) the consistency of scores resulting from our measurement procedure as a whole with the population from which replicates comprising our measurement procedure were drawn (e.g., the population of items, raters, tasks, or combinations thereof believed to measure the construct ...

Table of contents

  1. Cover
  2. Title
  3. Copyright
  4. Contents
  5. Preface
  6. About the Editors
  7. Contributors
  8. Part I Foundations of Psychological Measurement and Evaluation Applied to Employee Selection
  9. Part II Implementation and Management of Employee Selection Systems in Work Organizations
  10. Part III Categories of Individual Difference Constructs for Employee Selection
  11. Part IV Decisions in Developing, Selecting, Using, and Evaluating Predictors
  12. Part V Criterion Constructs in Employee Selection
  13. Part VI Legal and Ethical Issues in Employee Selection
  14. Part VII Employee Selection in Specific Organizational Contexts
  15. Part VIII Technology and Employee Selection
  16. Index