Header

Importance of Assessment and Evaluation

Steve Ehrmann
Vice President, Teaching, Learning and Technology Group

Hi, I’m Steve Ehrmann. I direct the Flashlight Program for the improvement of educational uses of technology at the Teaching, Learning and Technology Group.  The TLT Group is a non-profit that supports educational institutions. We work with over 180 institutions and projects, now including the DEID project.  We’re a spin off of Annenberg CCB and the American Association of Higher Education.  I also want to thank Washington State University, St. Edwards University, and Notre Dame with which we work particularly closely, and the founding sponsors who helped get us started six years ago: Blackboard, Compaq, Microsoft, SCT and WebCT.

Today, I would like to talk with you about assessment and one of the things you are going to be up against right away is dealing with people who have some pretty typical attitudes about assessment.  For example, they’ve never seen an assessment that was worth doing.  They literally couldn’t tell you about even one example of an assessment whose findings justify the money that was put into it. So they tend to be reluctant to put money or time into new assessments.  At the same time, some of these same people may be saying, “But we’ve got to do it anyway because they say we must.  Somebody outside says we’ve got to so let’s get it over with quickly and do it as simply as possible.”  So, I think this happens to be an attitude that leads to non productive assessments and its better to think about why you really should do assessments.

When I was working with Annenberg CPB myself back in the 80s, uh, Annenberg had been distributing its video on public television stations, so if you were taking a telecourse, you’d watch a public television program.  In the mid 80s, Annenberg began to distribute video also over cassette.  Peter Durn, our deputy director, did a study of how the video tapes were being used and he found they were being used pretty much the same way that public television program was.  People would set aside an hour, start the VCR, and an hour later, they would stop it. Six years later he replicated the study, but the results were entirely different, despite the fact that the materials hadn’t changed at all.  Now people were starting the video on the cassette player, they’d watch it for ten minutes, get to a point they didn’t understand, they’d replay that point perhaps two or three times until they got it. Then they’d play some more or perhaps then they may need to go off and go to work or do something in the kitchen, they turn off the VCR, come back again later and turn it back on again.

The lesson of this is that the outcomes of technology, and by outcomes, I mean, who can learn, what they learn and how much it costs them to learn.  The outcomes of the technology are determined by how the technology is actually used. That’s not what the designer wants to be done; it’s what the users actually end up doing.  Now the more empowering the technology, the more choices it offers users, the greater the uncertainty about the outcomes, and for that matter, about the costs. Same thing is true with Moore’s law that says technology keeps changing.  All those things add together that you can’t depend on what happened some place else or even at your own institution a couple of years ago to guide what it is that outcomes are now. That’s why you need to study where you are now.

How do you study where you are now?
I think it involves thinking something like a detective.  Um, I’m Joe metaphor… so think about a baseball player.  How would a coach investigate what a baseball player is doing in order to help the baseball player to hit the ball farther. Now one model is that the coach would stand in the outfield and every time the ball player would hit the ball, they’d tell him how far it had gone.  Ump!  That shot went 80 feet… that one went 110 feet… “you must be doing something right, do more of that.”  Oops the next one went 20 feet, “whatever it is your doing wrong, stop that!”

Now obviously looking just at outcomes is not going to be enough, especially if your purpose is improving either hitter’s swing or the outcomes of a program.  Imagine a math course. I’m going to stick with this math course now, and throughout the rest of my remarks.  This is a math course that’s being taught at a distance in my imagination.  It’s intended to help students do better in math, so you could just rest with how their math scores are after its over, what are their retention rates, perhaps you could look at how students did in later math courses to see whether this math course did a good job in preparing them.  But if that is all you do, then, what does that tell you about how to make that math course better?  Or even how you respond to a skeptic who says, “Oh yes, those courses are high in your experimental course but that’s just because you are getting smart students.”

I think ideally you need four key kinds of data.  Outcomes data is important. It’s important to get input data [too].  For example, the math skills of the incoming students and their initial motivation.  But you’d also like to get process data.  For example, if the students are meant to learn in part because they are working together online, how much are they working together online?  And finally, a fourth kind of data are factors that are likely to affect that process.  For example, what do the students feel about collaboration, are they having ISP problems, and so on.  All kinds of things can affect how much they collaborate.

Program Improvement

Once you’ve got this kind of information using your four kinds of data, there are, and this is pure coincidence, I think there are four ways of using that data.  One, for example, is to help establish a long-term focus on improving certain elements of program quality.  For example, you’d be looking at how well students do year after year in succeeding in later courses.  You also might be looking at course after course, year after year, about how well collaboration is going, and it helps to keep people’s attention on achievement and collaboration. In academia our attention tends to wander, but periodic evaluation can keep bringing us back to what’s important. 

Testing Theories

The second use of evaluation is to test your theory. Your findings, for example, indicate that there’s a continuing relationship between the kinds of collaboration that your faculty and students are doing and the outcomes in that math course. 

Diagnostics

The third use of assessment data is diagnosis in order to increase success rates.  For example, there are a lot of things that might prevent at least a couple students from participating effectively online.  Maybe they’re having problems with their Internet service providers; maybe a couple others have doubts about whether the collaboration is worthwhile so they’re not trying very hard. Maybe another couple are having trouble with the instructions for the process and a third set don’t know how to cope when their partner doesn’t carry out with what they’re supposed to be doing. Lots of reasons why there may be problems, and the more you know about them, the easier it is to fix them, and having fixed them, you can go ahead and improve performance.

Controlling Costs and Stress

The fourth use for assessment is controlling cost and stress.  The whole process of supporting collaboration can be difficult.  What does it take to get students together in groups?  How about grading them and so on?  If you study these processes and look at them routinely, you may be able to invent ways to do them more effectively and yet be less stressful on the people involved and perhaps even less stressful on the budgets that are involved. So those are our four uses of assessment.

In other material that is covered in this module, you’re going to hear about some success stories, hope they are useful. I hope that when you design your own studies, you’ll have successes too, and you’ll tell us about them. Good luck and thanks.