My friend’s question about whether she is a good teacher got me off on a research binge. Her question arose from negative teaching evaluations; something that I think even the best teacher has to put up with.

Earlier in the semester another faculty member mentioned that students’ evaluations of teaching don’t correspond to their learning. That study shows that professor quality has an effect on student learning measured through their grades in subsequent courses. Student evaluations of teachers are heavily influenced by the grade the student expects to receive in the current (contemporaneous) course.

Inevitably, in freshman weed-out courses you will get worse teaching evaluations than for a small senior seminar.

This paper also shows that professors who do well at helping students with contemporaneous learning on average harm the students in subsequent (more advanced) classes. Whoops! Student evaluations are positively correlated with contemporaneous achievement and negatively correlated with follow-on achievement. Academic rank, experience teaching, whether or not you have a terminal degree are negatively correlated with performance in the current class and positively correlated with student performance in subsequent classes.

Perhaps my friend should embrace the painful teaching evaluations?

On the other hand, this worries me; my teaching evaluations are fairly good. Am I teaching to the test or am I promoting deep learning? I hope I am promoting deep learning, although I also think that exams should not come as a surprise to the students provided the student is doing what he/she is supposed to do. I will have to reflect on this in future semesters. I know that having taught broadly within the curriculum has influenced my teaching; when I know what expectations are in later courses, I make sure things get covered in earlier courses to promote the later learning.

I’ve taught differential equations three times, and I have consciously tried to make the course more difficult each time I’ve taught it.

Another complicating factor in evaluating student evaluations is that they aren’t gender-blind. Male students rate female professors more negatively than male professors. Female students tend to rate female professors more highly, but this doesn’t help if you are a female teaching in a predominantly male discipline (like upper-level math).

This got me thinking about my mathematical modeling course. The best semester I had with it was Spring of 2012. That class produced great work and awesome outcomes; I had a lot of fun with them, and they with me. The Fall of 2012 class was a real let down; there were a lot of weak students in the class, and I was too lenient early in the semester. This semester is going well and students are performing well, but there’s not that extra oomph I recall from Spring 2012.

I wonder what I’m doing different or not as well.

After reading the article, I started wondering if it is me. In Spring 2012, out of 18 students, I had 8 women in the class. In Fall 2012, I had 5 women of 17 students. This semester I have 4 women out of 34 students split into two women in each of two sections of 17 students. I think that is one of the major influences on the dynamic of the class. You need a certain critical mass of women in the class for the culture to change. I had it in Spring 2012, and I don’t have it now.

[After confessing that I haven’t read the second article yet]

How do you set the stage for teaching evaluations? In some cases, simply informing students about potential biases in how they evaluate can counteract those biases.

It sounds like your experiences with the cultures of different groups of students argues for continuing to push women to take risks and take those advanced math classes! Easier said than done, of course, but it’s important for people to realize that diverse classes are often better for everyone involved.

The whole time I was at the university where I earned my Ph.D., I wished so strongly that faculty would sit down and talk about connections between the lower-level courses and upper-level courses. I kept getting students who wanted to do research who acted like they’d never written a scientific paper in their lives, when they all learned at least the basics in introductory courses (if only this had been emphasized in the upper-level courses, too). I also had to sit through some painful Genetics lectures, where about half of the semester was devoted to material that had already been covered in the introductory courses. Just made me wonder – why did we even bother in the introductory courses? While some overlap is beneficial, I think it also caused some of the students to tune things out and fail to notice the addition of new or more in-depth material.

[WORDPRESS HASHCASH] The poster sent us ‘0 which is not a hashcash value.

The article you cited begs the question: which course are you teaching? Your (contemporaneous course) or the next course they might take? Clearly, you have course objectives for your (contemporaneous) course. If students meet those objectives, they ought to get better grades and you ought to get better ratings. After all, many faculty believe (correctly, I think) that the best measure of teaching is learning. The question is how do students do in the follow-on course? If the contemporaneous course objectives are adequate preparation for the follow on course, students who do well in the contemporaneous course should do well in the follow-on. If not, there is probably a mis-match between the objectives. That is a problem for the curriculum committee in your department.

You mention that student evaluations are related to expected grades. Most faculty seem to think the relation is higher than it is. The literature on student evaluations shows that the correlation between ratings and expected grades ranges from .1 to .3. In other words, between 1% and 9% of the variance in ratings is attributable to expected grades. In 25 years of doing those calculations for every department/college I haven’t seen the correlations fall below .18 or go higher than .28.

That begs another question: how high should the relation be? If evaluations are a perfect measure of teaching and grades are a perfect measure of learning, and if the best measure of teaching is learning, then the correlation between teaching and learning should be closer to 1.0. Of course, neither is a perfect measure, but evaluations have been shown to have high reliability and to have high criterion validity as measured by peer reviews and common exams. Grades are more problematic as they don’t often reflect learning but other factors such as prior knowledge and class attendance among others.

Evaluations are positively related to class level. Evaluations in freshman classes are lower than sophomore classes, which are lower than junior classes, which are lower than senior classes, which are lower than graduate classes. No surprise there. Student motivation to take the course has a big influence on how much students learn, which, we’ve already established, is related to evaluations. Faculty, and probably department heads, make a big mistake in thinking that freshman course ratings can be compared to higher level course ratings.

Your article on gender bias in ratings is outdated. It may have been true in the nineties, but recent research does not find such biases. I examined ratings at A&M in 2009 and found that, overall, female instructors received higher ratings than male instructors. However, when I controlled for department and course level, the gender bias disappeared.

The real problem with student evaluations lies, not in the evaluations themselves, but how they are used. Too often they are the only measure of teaching, when, in fact, they are one measure of many. too often context, such as course level, class size, student motivation, are not considered in interpreting the ratings.