Handbook of Item Response Theory Modeling
eBook - ePub

Handbook of Item Response Theory Modeling

Applications to Typical Performance Assessment

Steven P. Reise, Dennis A. Revicki, Steven P. Reise, Dennis A. Revicki

  1. 466 pages
  2. English
  3. ePUB (adapté aux mobiles)
  4. Disponible sur iOS et Android
eBook - ePub

Handbook of Item Response Theory Modeling

Applications to Typical Performance Assessment

Steven P. Reise, Dennis A. Revicki, Steven P. Reise, Dennis A. Revicki

DĂ©tails du livre
Aperçu du livre
Table des matiĂšres
Citations

À propos de ce livre

Item response theory (IRT) has moved beyond the confines of educational measurement into assessment domains such as personality, psychopathology, and patient-reported outcomes. Classic and emerging IRT methods and applications that are revolutionizing psychological measurement, particularly for health assessments used to demonstrate treatment effectiveness, are reviewed in this new volume. World renowned contributors present the latest research and methodologies about these models along with their applications and related challenges. Examples using real data, some from NIH-PROMIS, show how to apply these models in actual research situations. Chapters review fundamental issues of IRT, modern estimation methods, testing assumptions, evaluating fit, item banking, scoring in multidimensional models, and advanced IRT methods. New multidimensional models are provided along with suggestions for deciding among the family of IRT models available. Each chapter provides an introduction, describes state-of-the art research methods, demonstrates an application, and provides a summary. The book addresses the most critical IRT conceptual and statistical issues confronting researchers and advanced students in psychology, education, and medicine today. Although the chapters highlight health outcomes data the issues addressed are relevant to any content domain.

The book addresses:

IRT models applied to non-educational data especially patient reported outcomes

Differences between cognitive and non-cognitive constructs and the challenges these bring to modeling.

The application of multidimensional IRT models designed to capture typical performance data.

Cutting-edge methods for deriving a single latent dimension from multidimensional data

A new model designed for the measurement of constructs that are defined on one end of a continuum such as substance abuse

Scoring individuals under different multidimensional IRT models and item banking for patient-reported health outcomes

How to evaluate measurement invariance, diagnose problems with response categories, and assess growth and change.

Part 1 reviews fundamental topics such as assumption testing, parameter estimation, and the assessment of model and person fit. New, emerging, and classic IRT models including modeling multidimensional data and the use of new IRT models in typical performance measurement contexts are examined in Part 2. Part 3 reviews the major applications of IRT models such as scoring, item banking for patient-reported health outcomes, evaluating measurement invariance, linking scales to a common metric, and measuring growth and change. The book concludes with a look at future IRT applications in health outcomes measurement. The book summarizes the latest advances and critiques foundational topics such a multidimensionality, assessment of fit, handling non-normality, as well as applied topics such as differential item functioning and multidimensional linking.

Intended for researchers, advanced students, and practitioners in psychology, education, and medicine interested in applying IRT methods, this book also serves as a text in advanced graduate courses on IRT or measurement. Familiarity with factor analysis, latent variables, IRT, and basic measurement theory is assumed.

Foire aux questions

Comment puis-je résilier mon abonnement ?
Il vous suffit de vous rendre dans la section compte dans paramĂštres et de cliquer sur « RĂ©silier l’abonnement ». C’est aussi simple que cela ! Une fois que vous aurez rĂ©siliĂ© votre abonnement, il restera actif pour le reste de la pĂ©riode pour laquelle vous avez payĂ©. DĂ©couvrez-en plus ici.
Puis-je / comment puis-je télécharger des livres ?
Pour le moment, tous nos livres en format ePub adaptĂ©s aux mobiles peuvent ĂȘtre tĂ©lĂ©chargĂ©s via l’application. La plupart de nos PDF sont Ă©galement disponibles en tĂ©lĂ©chargement et les autres seront tĂ©lĂ©chargeables trĂšs prochainement. DĂ©couvrez-en plus ici.
Quelle est la différence entre les formules tarifaires ?
Les deux abonnements vous donnent un accĂšs complet Ă  la bibliothĂšque et Ă  toutes les fonctionnalitĂ©s de Perlego. Les seules diffĂ©rences sont les tarifs ainsi que la pĂ©riode d’abonnement : avec l’abonnement annuel, vous Ă©conomiserez environ 30 % par rapport Ă  12 mois d’abonnement mensuel.
Qu’est-ce que Perlego ?
Nous sommes un service d’abonnement Ă  des ouvrages universitaires en ligne, oĂč vous pouvez accĂ©der Ă  toute une bibliothĂšque pour un prix infĂ©rieur Ă  celui d’un seul livre par mois. Avec plus d’un million de livres sur plus de 1 000 sujets, nous avons ce qu’il vous faut ! DĂ©couvrez-en plus ici.
Prenez-vous en charge la synthÚse vocale ?
Recherchez le symbole Écouter sur votre prochain livre pour voir si vous pouvez l’écouter. L’outil Écouter lit le texte Ă  haute voix pour vous, en surlignant le passage qui est en cours de lecture. Vous pouvez le mettre sur pause, l’accĂ©lĂ©rer ou le ralentir. DĂ©couvrez-en plus ici.
Est-ce que Handbook of Item Response Theory Modeling est un PDF/ePUB en ligne ?
Oui, vous pouvez accĂ©der Ă  Handbook of Item Response Theory Modeling par Steven P. Reise, Dennis A. Revicki, Steven P. Reise, Dennis A. Revicki en format PDF et/ou ePUB ainsi qu’à d’autres livres populaires dans Psychologie et Recherche et mĂ©thodologie en psychologie. Nous disposons de plus d’un million d’ouvrages Ă  dĂ©couvrir dans notre catalogue.

Informations

Éditeur
Routledge
Année
2014
ISBN
9781317565697
Part I
Fundamental Issues in Item Response Theory

1
Introduction

Age-Old Problems and Modern Solutions
Steven P. Reise and Dennis A. Revicki
The statistical foundation of item response theory (IRT) is often traced back to the seminal work of Lord, Novick, and Birnbaum (1968). The subsequent development, research, and application of IRT models and related methods link directly to the need of large-scale testing companies, such as the Educational Testing Service, to solve statistical as well as practical problems in educational assessment (i.e., the measurement of aptitude, achievement, and ability constructs). Daunting problems in this include the challenge of administering different test items to demographically diverse individuals across multiple years, while maintaining scores that are comparable on the same scale. This test score comparability problem traditionally has been addressed with “test-score equating” methods, but now more routinely, IRT-based “linking” strategies are used (see Chapter 19).
The application of IRT models and methods in educational assessment is now commonplace (e.g., see most any recent issue of the Journal of Educational Measurement), especially for large-scale testing firms that employ on their research staff dozens of world-class psychometricians, content experts, and item writers. The application of IRT models, and related statistical methods in the fields of personality, psychopathology, patient-reported outcomes (PRO), and health-related quality-of-life (HRQOL) measurement, in contrast, has only recently begun to proliferate in research journals. In these noneducational or “typical performance” domains, the application of IRT has gained popularity for much the same reasons as in large-scale educational assessment; that is, to solve practical and technical problems in measurement.
The National Institutes of Health (NIH) Patient Reported Outcome Measurement Information System (PROMISÂź), for example, has developed multiple item banks for measuring various physical, mental, and social health domains (Cella et al., 2007; Cella et al., 2010). Similarly, the Quality of Life in Neurological Disorders (www.neuroqol.org) and NIH Toolbox (www.nihtoolbox.org) have also employed IRT methods of scale development and item analysis. One of the chief motivations underlying the application of IRT methods in these projects was to solve a long-standing and well-recognized problem in health outcomes research; namely, that for any important construct, there are typically half a dozen or so competing measures of unknown quality and questionable validity. This chaotic measurement situation, with dozens of researchers studying the same phenomena using different measurement tools, fails to promote good research and inhibits the cumulative aggregation of research results.
Large-scale IRT application projects, such as PROMIS¼, have raised awareness not only of the technical and practical challenges of applying IRT models to psychological or PRO data, in general, but also has uncovered the many and varied special problems and concerns that arise in applying IRT outside of educational assessment (see also Reise & Waller, 2009). We will highlight several of these critical challenges later in this chapter to set a context for the present volume. Before doing so, however, we note that thus far, standard IRT models and methods have been imported into noneducational measurement contexts, and essentially without modification. In other words, there has been little in the way of “new models” or “new statistical methods” uniquely appropriate for PRO or any other type of noneducational data (but see Chapter 13).
This equalitarian—the same IRT models and methods should be used for all constructs, educational or PRO—was perhaps critical in early stages of IRT exploration and application in new domains. Inevitably, we believe, further progress will require new IRT-based psychometric approaches particularly tailored to meet measurement challenges in noneducational assessment. We will expand on this in the final chapter. For now, prior to previewing the chapters in this edited volume, in the following section, we briefly discuss some critical differences between educational and noneducational constructs, data, and assessment contexts, as these relate to the application of IRT models. We argue that although there are fundamental technical issues in applying IRT to any domain (e.g., dimensionality issues, assessing model to data fit), unique challenges arise when applying IRT to noneducational data due to the nature of the constructs (e.g., limited conceptual breadth, questionable applicability across the entire population), and item response data (e.g., non-normal latent trait distribution issues).

Educational Versus Noneducational Measurement

It is well recognized that psychological constructs, both cognitive and noncognitive, can be conceptualized as being hierarchically arranged, from very general to middle level, conceptually narrow to specific behaviors (Clark & Watson, 1995).1 Since Loevinger (1957), it has also been well recognized (although not necessarily realized in practice by scale developers) that the position of a construct in this hierarchy has profound implications for all aspects of scale development, psychometric analyses, and ultimately validation of test score inferences.
Almost by definition, measures of broad bandwidth constructs (intelligence, verbal ability, negative affectivity, general distress, overall life satisfaction, or QOL) must have heterogeneous item content to capture the diversity of trait manifestations.2 In turn, item intercorrelations, item-test correlations, and factor-loadings/IRT slopes are expected to be modest in magnitude, with low communality. Moreover, resulting factor structures may (must?) be multidimensional to some degree, perhaps with a strong general factor and several so-called group or specific factors corresponding to more content-homogeneous domains (see Chapter 2).
On the other hand, just the opposite psychometric properties would be expected for measures of conceptually narrow constructs (mathematics self-efficacy, primary narcissism, fatigue, pain interference, germ phobia). That is, in this latter context, the content diversity of trait manifestation is very limited (by definition of the construct), and as a consequence, item content is homogeneous with the conceptual distance between the item content and the latent trait being slim. In turn, this can result in very high item intercorrelations, item-test correlations, and factor-loadings/IRT slopes. In factor analyses, essential unidimensionality would be the expectation, as would high item communalities. Finally, in contrast to broadband measures, where local independence violations are typically caused by clusters of content-similar items, in narrowband measures, local independence violations are typically caused by having the same item content repeated over and over with slight variation (e.g., “I have problems concentrating,” “I find it hard to concentrate,” “I lose my concentration while driving,” “It is sometimes hard for me to concentrate at work”).
In our judgment, applications of IRT in educational measurement have tended toward the more broadband constructs, such as verbal and quantitative aptitude, or comprehensive licensure testing contexts (which also involve competencies across a heterogeneous skill domain). In contrast, we argue that with few exceptions, applications of IRT in noneducational measurement have primarily been with constructs that are relatively conceptually narrow. As a consequence, IRT applications in noneducational measurement contexts present some unique challenges, and the results of such applications can be markedly different from a typical IRT application in education.
For illustration, Embretson and Reise (in preparation) report on an analysis of the PROMIS¼ anger item set (see Pilkonis et al., 2010), a set of 29 items rated on a 1 to 5 response scale. Anger is arguably conceptually narrow because there simply are not that many ways of being angry (especially when rated within the past seven days); that is, the potential pool of item content is very limited, unlike a construct, say, such as spelling or reading comprehension where the pool of items is virtually inexhaustible. Accordingly, alpha was 0.96, and an eigenvalue ratio of around 15 to 1, suggesting unidimensionality, or at least a strong common factor. Fitting a unidimensional confirmatory factor analysis resulted in an “acceptable” fit by conventional standards. However, univariate and multivariate Lagrange tests indicated 407 and 157 correlated residuals needed to be estimated (set free), respectively. This unambiguous evidence against the data meeting the unidimensionality/local independence assumption was not due to the anger data being in any real sense of the term “multidimensional,” with substantively interpretable distinct factors, but rather as having many sizeable correlated residuals (violations...

Table des matiĂšres