Considering Comparability and Change over Time: Measurement Invariance as an Essential Consideration in Theory Development and Testing
Kaylee Litson and David Feldon
Abstract
There is currently a great deal of attention in psychometric and statistical methods on ensuring measurement invariance when examining measures across time or populations. When measurement invariance is established, changes in scores over time or across groups can be attributed to changes in the construct rather than changes in reaction to or interpretation of the measurement instrument. When measurement in not invariant, it is possible that measured differences are due to the measurement instrument itself and not to the underlying phenomenon of interest. This chapter discusses the importance of establishing measurement invariance specifically in postsecondary settings, where it is anticipated that individuals' perspectives will change over time as a function of their higher education experiences. Using examples from several measures commonly used in higher education research, the concepts and processes underlying tests of measurement invariance are explained and analyses are interpreted using data from a US-based longitudinal study on bioscience PhD students. These measures include sense of belonging over time and across groups, mental well-being over time, and perceived mentorship quality over time. The chapter ends with a discussion about the implications of longitudinal and group measurement invariance as an important conceptual property for moving forward equitable, reproducible, and generalizable quantitative research in higher education. Invariance methods may further be relevant for addressing criticisms about quantitative analyses being biased toward majority populations that have been discussed by critical theorists engaging quantitative research strategies.
Keywords: Measurement invariance; longitudinal methods; group differences; quantcrit; sense of belonging; mental well-being; mentorship quality
Introduction
As students advance through higher education programs, their knowledge, skills, and perspectives should change in response to their engagement with the academic community (Denovan, Dagnall, Macaskill, & Papageorgiou, 2020; Soria & Johnson, 2017). Ultimately, many theories explaining student development and success conceptualize postsecondary outcomes as the result of personal and organizational processes that unfold over time (e.g., Baker & Lattuca, 2010; McAlpine, Amundsen, & Turner, 2014; Weidman, Twale, & Stein, 2001; Wenger, 1999). Typically, quantitative efforts to measure these processes make the tacit assumption that changes in student behaviors are accurately reflected by changes in scores on measurement instruments. Yet, such an assumption may be incorrect in practice. Measurement instruments themselves can function differently over time if changes in student perspective lead to shifting interpretations of question or response option meanings. If survey questions are perceived by students differently over time, then changes in obtained scores may reflect these differences in perception rather than actual changes in the construct targeted by the measure. Unfortunately, many instruments used in higher education do not consider how the measurement instrument functions over time when measuring changes in outcomes. Rather, developmental changes in student perspectives and meaning-making are not typically considered to affect the structure and validity of the constructs being measured.
When using a survey instrument, for example, the presentation of the same questions and response options is typically unchanged from deployment to deployment. This practice is customary to maximize the comparability of data collected at one time point with the data collected at another. For example, the Higher Education Research Institute (http://heri.ucla.edu) conducts an annual study of first-year undergraduates in the United States to determine their background characteristics, reasons for choosing their institution, academic behaviors, and a host of other constructs (Stolzenberg et al., 2020). Each year, participants respond to established questions that had been previously validated with a known factor structure, characterizing the ways in which individual questions covaried to reflect more general factors (e.g., academic self-concept, civic engagement, etc.). Similarly, the National Academics Panel Study (NACAPS; Briedis et al., 2020) conducted by the German Centre for Higher Education Research and Science Studies administers biannual surveys to young researchers longitudinally. NACAPS assesses training experiences and environments, including mentoring, career aspirations and attainment, and individual characteristics as researchers move through their doctoral and postdoctoral research. However, the responses elicited from participants are not typically considered in the context of how their perspectives might influence their responses to static questions at different points in time. Instead, changes in responses over time are typically attributed to changes in participant development and not the way that participants inherently perceive and make meaning of the survey questions.
Similarly, the constructed sensemaking of members of specific subpopulations (e.g., demographic groups, programmatic areas) may differ, leading to observed score differences, which may or may not be appropriately attributed to differences in the level of the target construct. As with responses collected over time, data collected across groups may reflect true differences in experiences targeted by a measure, or they may be due to interpretive differences. This aspect of quantitative inquiry is particularly important as student populations become increasingly diverse along a variety of dimensions. Because students are likely to hold perspectives that differ substantially from those who develop research instruments, it is especially important that the ways these instruments function to encode student experiences and perspectives into numbers be both consistent and equitable across sets of experiences over time and across groups. Failure to do so risks misinterpreting or misattributing differences in ways that exclude minoritized voices and perspectives from efforts to understand impacts of postsecondary experiences and opportunities.
These issues of whether measurement instruments function equivalently across time and groups are collectively known as factorial measurement invariance. In this chapter, we discuss different types of measurement invariance that can be tested, present how to statistically evaluate measurement invariance, and then report findings regarding longitudinal and group-level factorial invariance of common measures used in higher education research.
What Is Measurement Invariance?
Factorial measurement invariance, sometimes called measurement equivalence, refers to the assumption that a measurement instrument measures the same construct, behavior, or attitude consistently across multiple administrations (Jöreskog, 1971; Meredith, 1993; van de Schoot, Schmidt, De Beuckelaer, Lek, & Zondervan-Zwijnenburg, 2015; Widaman, Ferrer, & Conger, 2010), such that the relationship between items and the construct that the items measure should not vary across time or group membership. Without measurement invariance, change over time or across groups may be attributed either to changes in the way the measurement instrument functions, potentially due to critical developmental changes in the underlying characteristics of respondents over time, or actual changes in the construct, with no clear way to determine which source of change is correct. Yet most measures used in higher education – even those used repeatedly over time – do not test for or acknowledge measurement invariance (Feldon, 2020), with few exceptions (e.g., Coertjens, Donche, De Maeyer, Vanthournout, & Van Petegem, 2012). Despite its necessity to appropriately interpret statistical findings, measurement invariance is a largely forgotten or ignored aspect of higher education research.
There is currently a great deal of attention in psychometric and statistical methods on ensuring factorial measurement invariance when examining measures across time or populations (Millsap, 2011). Demonstrating measurement invariance typically involves running a ser...