eBook - ePub

A Comprehensive Critique of Student Evaluation of Teaching

Name: A Comprehensive Critique of Student Evaluation of Teaching
Author: Dennis E. Clayson

Critical Perspectives on Validity, Reliability, and Impartiality

Dennis E. Clayson,

152 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

A Comprehensive Critique of Student Evaluation of Teaching

Critical Perspectives on Validity, Reliability, and Impartiality

Dennis E. Clayson,

Book details

Book preview

Table of contents

Citations

About This Book

This thought-provoking volume offers comprehensive analysis of contemporary research and literature on student evaluation of teaching (SET) in Higher Education.

In evaluating data from fields including education, psychology, engineering, science, and business, this volume critically engages with the assumption that SET is a reliable and valid measure of effective teaching. Clayson navigates a range of cultural, social, and era-related factors including gender, grades, personality, student honesty, and halo effects to consider how these may impact on the accuracy and impartiality of student evaluations. Ultimately, he posits a "popularity hypothesis", asserting that above all, SET measures instructor likability. While controversial, the hypothesis powerfully and persuasively draws on extensive and divergent literature to offer new and salient insights regarding the growing and potentially misleading phenomenon of SET.

This topical and transdisciplinary book will be of great interest to researchers, faculty, and administrators in the fields of higher education management, administration, teaching and learning.

Frequently asked questions

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Yes, you can access A Comprehensive Critique of Student Evaluation of Teaching by Dennis E. Clayson in PDF and/or ePUB format, as well as other popular books in Education & Evaluation & Assessment in Education. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Routledge

Year

2020

ISBN

9781000281927

Edition

Topic

Education

Subtopic

Evaluation & Assessment in Education

1Issues and Debates Surrounding Student Evaluations of Teaching

What Are the Issues?

Louis thought of himself as a good teacher. He also loved research. These two passions were what led him to become a professor. Even as an undergraduate, he found he could explain things to his classmates so they readily understood them. Numerous times, someone he helped would say, “Why didn’t the professor explain it like that?” Upon gaining his doctorate, Louis went to work for a private college where he enjoyed his interaction with students, and felt pride in their achievements. Yet, in the second decade of teaching, Louis saw his student evaluations go from the top 90th percentile to the bottom 20th percentile. In other words, he, in the eyes of his students, went from one of the best instructors to one of the worst. What could have caused such a dramatic reversal? The only change in Louis’s life was a change in schools. He left the private school where he had taught for almost ten years for a larger public university, which increased his salary and gave him more opportunities for research. Prior to his move, he was twice nominated by graduating seniors for the college’s faculty award, the highest honor given by the college to faculty. In his last year, he was nominated by his peers to give the faculty lecture, another honor, this time by his colleagues. After his first year at the new university, he was shocked to find his student evaluations were lower than he had ever experienced. Years later, his scores remained below average. Louis became convinced there was something wrong with the evaluation process. His approach in the classroom had not changed. His personality and his ability to explain complicated material had not been suddenly modified. He knew his students were different, but were they that much different? They were all undergraduate students. Nothing he had done had created this change, so what was the evaluation actually measuring? Was he a good teacher or was he not? Where once Louis couldn’t wait to get into the classroom, he now dreaded each new class.

What can we tell Louis? Were the evaluations his students completed almost every term assessing his actual teaching ability? If not, then what were they measuring? Perhaps they were a measure of Louis himself, independent of his teaching abilities. Yet, what he was doing in the classroom had not changed, and he had not changed either. On the other hand, the students were different. On average, the students in the private college had better standardized scores and came from a higher socioeconomic background, but was that enough to create such a dramatic shift in his evaluations? Were the evaluations just a measure of compatibility? If a good teacher could reach a certain group of students, and not another, did that then imply the evaluations were a measure of the students themselves, and only a secondary indicator of the instructor? Would not accepting his evaluations at face value suggest that “good” or “effective” teaching was whatever the students said it was? Nevertheless, shouldn’t a teacher adapt to his or her students?

Louis, in an interview with his dean, suggested his classes were too rigorous for the students at his present school, but the dean assured him this was one of the primary reasons he was hired. “Our students need to be challenged,” she said. Later, Louis smiled at the irony, mentally noting that the connection between rigor and learning had simply been assumed. No one had suggested his students be independently tested to discover if they were actually learning. Further, with her statement, the dean was admitting that she didn’t believe the evaluations and learning were necessarily related. All Louis really knew was his students at the new school did not like him.

Introduction

There have been divisive issues among academicians in the past, but few have been as well researched and long-lasting as the discussion about the student evaluation of teaching (SET). Every aspect of the process has been investigated. Even the title of the evaluations became a matter of debate. The issue revolved around what students were doing, and what they were qualified to do, when they responded to the forms. Some suggested students were not qualified to evaluate instruction, but they could “rate” their experience in class by utilizing a ranking scale. In this view, the “evaluation” is not actually performed by students, but by professionals utilizing student input. Others insisted students were qualified to evaluate teaching because they are the ones actually present when the instruction takes place (Berk, 2013; Hativa, 2014). In this historical debate, it is enlightening to see what some of the oldest rating scales were called. Early instruments at the University of Washington were titled a “Survey of Student Opinion of Teaching” (Langen, 1966), while those at Michigan State were simply called the “Teacher Evaluation Sheet” (Dressel, 1961). At Purdue, it was a “Rating Scale for Instruction” (Page, 1974) and at the University of Minnesota, the form was a “Student Opinion Survey” (Doyle, 1975). Note that the titles not only reflect a wide range of views, they also imply there was no consensus about what the process was designed to measure. Is the purpose to create a scale to supposedly measure a wide expanse, summarized by the word “instruction,” or simply to survey “opinions?” Further, are the students attempting to measure an instructor, or what the instructor does? Consequently, for those who don’t quibble about who is evaluating whom, the terms Student Rating of Instruction (SRI) or Student Evaluation of Instruction (SEI) are used, or the more common term, Student Evaluation of Teaching (SET) (see Baldwin, Ching, & Hsu, 2018). There are also a number of widely used evaluation instruments created by researchers and consultants, including the Individual Development & Educational Assessment (IDEA) created by a research and consulting group called IDEA out of Kansas State University, and a form created by Herbert Marsh called the Student Evaluation of Educational Quality (SEEQ). As can be seen from these titles, there is little agreement about what students are actually doing, and what is supposedly being measured.

In this book, we will simply refer to the process as the Student Evaluation of Teaching (SET), realizing, as we proceed, that the title may not reflect all the nuances of the research.

History

Even with a lack of consensus, and even while a vigorous debate was occurring questioning the validity of the process, the utilization of the evaluations became, for all practical purposes, universal. Initially, much of the research was positive and justified the popularity of the procedure. However, a dramatic increase in the utilization and impact of the evaluations occurred in the last several decades, ironically just as the research on SET was becoming increasingly negative.

Although investigations of SET date from the 1920s (Gump, 2007; Langen, 1966; Wachtel, 1998), one of the first more readily accessible research papers on SET was a report of a survey and subsequent development of an evaluation procedure at the University of Washington in 1944 (Guthrie, 1949). From that survey, procedures for faculty promotion and evaluation were developed. Some of the findings from the developmental process would sound familiar to someone studying the procedure almost 80 years later. No correlation was found between teaching effectiveness and research contribution, and full professors did not rate better than assistant professors. During the 1960s, there was an increase in the interest of students evaluating faculty, but many schools did not embrace SET, even though there was a growing consensus the instruments were, “systematic and tangible kinds of evidence for evaluation of teaching performance” (Centra, 1977, p. 19 of 26). By 1973, it was reported that 28% of campuses used some sort of student evaluation of instructors. By 1984, the number had increased to 68%, and by the early 1990s, 86% of universities used SET for important faculty decisions (Seldin, 1993). Business schools appear to be ahead of the curve; by 1994, about 95% of the deans of accredited business schools used the evaluations as a source of teaching information (Crumbley, 1995). Shortly thereafter, a study by the Carnegie Foundation found 98% of universities were using some form of student evaluations (Magner, 1997). At about the same time, another study reported 99% of business schools utilized evaluations by students, and deans placed a higher importance on these evaluations than either administrative or peer assessments (Comm & Manthaisel, 1998). A more recent American Association of University Professors (AAUP) poll (Vasey & Carroll, 2016) found only 4% of instructors reported the student evaluations were not required. Yet, even for this small group, the evaluations were still recommended. Currently, it would be difficult to find a university that does not utilize some form of the student evaluation of teaching.

Not only was the utilization of the instruments becoming normative, on many campuses SET became the most important and, in many cases, the only measure of teaching ability (Wilson, 1998). The instruments were also being used to make important non-instructional decisions. In one survey, almost 90% of accounting professors reported SET instruments were used to determine tenure decisions, and 70% said the evaluations were utilized to determine merit pay (Crumbley & Reichelt, 2009). Seldin (1999) reports a California dean as saying, “If I trust one source of data on teaching performance, I trust the students” (p. 15).

As would be expected, the universal utilization of an assessment that could establish reputations, merit pay, promotion, and tenure would be extensively researched. As early as 1990, it was reported at least 2,000 citations to SET existed (Feldman, 1997; Centra, 2003). One source stated there was close to 3,000 articles published on SET just in the 15 years between 1990 to 2005 (Al-Issa & Sulieman, 2007). Reports on the topic were so voluminous that many researchers began to utilize meta‑analysis, in which a case was not a student or class average, but an entire published article (see Clayson, 2009; Cohen, 1980, 1981; Feldman, 1989; Spooren, Brockx, & Mortelmans, 2013; Stephen, Wright, & Jenkins-Guarnieri, 2011; Uttl, White, & Gonzalez, 2017 as examples). Nevertheless, little agreement had been made on key points. The defenders of the system were typically found in the colleges of education, and among those who consulted in the area. Some defended the evaluations almost as if they were religious tenets, and even referred to sources who identified contrary findings in strong and uncharacteristically negative terms (Aleamoni, 1999; Marsh, 1984; Marsh & Hattie, 2002; Marsh & Roche, 2000). These advocates typically had an advantage in the publication process since pedagogic research is the essential academic work of their profession. Other disciplines generally look upon research on instruction as less prestigious, and those opposed to the evaluation process are more dispersed among academic disciplines and more isolated in their publication outlets. They were, however, equally emphatic. In such an environment, it became relatively easy to select research findings that reinforced a point of view.

The following summary of the evaluation process is not free of these problems, but it does attempt to present information from a wider assortment of venues than is found in much of the traditional educational discipline outlets.

Question of Era

There are era and cultural matters related to SET. This is an issue which should influence our understanding of the evaluation process, but one that is rarely addressed. As previously indicated, there has been a change in the consensus about the validity of SET. Much of the current literature is negative, but, by the mid-1980s, the existing research on the evaluations was positive enough that negative attitudes toward them were referred to as “myths,” “half-truths,” and “witch hunts” (Aleamoni, 1987, 1999; Feldman, 1997; Marsh & Hattie, 2002; Theall & Franklin, 2001), a response that has been perpetuated by some compilers who have attempted to summarize the data (Gravestock & Gregor-Greenleaf, 2008; Hativa, 2014)

There were several reasons for this pre-millennial optimism.

First, a large amount of research had occurred. As a prominent scholar at the time noted, “Probably, students’ evaluations of teaching effectiveness are the most thoroughly studied of all forms of personnel evaluation, and one of the best in terms of being supported by empirical research” (Marsh, 1987, p. 369). As previously noted, at least 2,000 reports of SET existed before 1990 (Feldman, 1997). Much of this research, especially the research published in the top journals, was positive.

Second, most of the research was conducted and published fr...

Cover
Half Title
Series
Title
Copyright
Dedication
Contents
1 Issues and Debates Surrounding Student Evaluations of Teaching
2 Potential Impacts of Gender Bias on Student Evaluations
3 The Influence of Personality Traits on Student Evaluations
4 Halo Effects Impacting on Student Evaluations
5 Are Students Truthful?
6 Rigor, Grades, and How They Impact Student Evaluations
7 The Association Between Student Learning and Student Evaluations
8 Student Evaluations and the Improvement of Instruction
9 Challenging the Statistical Reliability of Student Evaluations
10 Traditional Validity and SET
11 Identifying Valid Applications of SET
12 Validity and the Impacts of Subjectivity
13 Introducing a Likability Hypothesis
14 Justifications of the Likability Hypothesis
15 Conclusion and Recommendations – the Future of SET
Index