In this lecture, we're going to talk about automated regression testing. And before we can talk about how to automate it, we have to talk about what it is. So regression testing is a mechanism to determine whether software continues to behave the same way after making updates to it. So our goal is to ensure that when we're making changes to the software, such as refactoring, we have not introduced new faults. Now, with refactoring, the idea is that we want the functional behavior to remain the same. We want to change the code structure to make it more readable, more modifiable, better in some way. And so regression testing and refactoring are an integral part of agile methods. And what we're trying to do is we're trying to allow refactoring with confidence. So we can make changes to the code, and we can easily run tests to determine whether or not we've broken anything. So there are several issues with regression testing if you want to do it well. The first is, do we have enough tests? And this is called the coverage identification problem. Have we adequately covered the new code structures that we have? The second thing is, which tests should we run? So when we're making a change, usually we're not changing the whole piece of software. We're making a change to a small piece of it. And so this is called the test selection problem, determining which tests need to be rerun. And finally, we have the issue of changing and new requirements. And this is the test suite maintenance problem. So if we are actually changing the functional behavior of the code, then some of the tests are going to fail. And this is expected, but cleaning out the failing tests can actually be a fairly difficult problem. But the good news is that these automated techniques that we've been looking at actually work quite well for regression testing. So one of the things that happens when we can automatically generate thousands of tests in relatively short time is that the problem of having enough tests, to some degree, goes away. Because we have this mechanism for cheap generation. And not only that, we have a strong oracle that we can use to check correctness, as I'll show you here in the next slide. Determining which tests we should run is actually a different topic called automated regression analysis. That's outside the scope of this course, but I'll provide you some resources if you want to look at it further. And the problem of test suite maintenance is also, to some degree, mitigated by using automated tests. Because if we can generate the test cheaply, then it doesn't cost us too much to throw them away. And we can generate new tests when faced with the new requirements. So I talked previously about a strong oracle for this kind of testing. And what we have really is we have the program under test, we have a test case. And then we can use the previous version of the program under test and just check to see if they always do the same thing for every input. So we can throw lots and lots of inputs at the program. And we have a very strong oracle for checking equivalence. And that's just the previous version of the program under test. And this oracle is called the consistency oracle. Now, another oracle we could use is a property-based oracle. And in the demo I'm going to give to you in a couple of slides, this is actually what we're going to use. So we're going to use a tool called EvoSuite, which generates coverage-based tests to determine what the behavior of the software currently is. And then when we make changes to the software, we can rerun those EvoSuite tests and determine whether or not the new software does the same thing. So in this case, what we're going to do is we're going to write a property monitor. Or actually, we're going to have EvoSuite write this property monitor for us. And then we're going to check to see whether the program under test behaves as expected with the new monitor. So using EvoSuite for refactoring is not a bad idea. It's free, essentially, to run, and the test that it generates yield high structural coverage of the code. And EvoSuite oracles, the ones that they build in, those property-based oracles, are designed to find mutation errors. So mutation errors are those that, say, you change a plus to a minus somewhere in your code. They are the common kinds of mistakes that programmers make when they're doing refactoring in a lot of cases. So the good news here is that if we use the EvoSuite oracle, to some degree, they're designed to catch the kinds of faults we introduce during refactoring. And we don't have to have the old code that we have to execute in parallel with the new code, as we would with a consistency oracle. But the bad news is that this is strictly a weaker oracle than a consistency oracle. One thing that can happen, though, with the EvoSuite test is that they look at the internal structure of the code. And if the internal types change when you're doing the refactoring, the tests may not be executable. Instead, they error out, and so the EvoSuite tests no longer tell you much about the behavior of the system. So when you're doing refactoring, it's always best to do it in small stages so that you preserve as much as you can of the code structure. So let me show you what this looks like. So here we have our microwave system with the DisplayController, the ModeController, and the Microwave, and we have some tests down here that are generated by EvoSuite. So we have a set of tests that are for the DisplayController, for the Microwave, for the ModeController, for the presets. And let me just walk through what those tests look like. Let's look at the DisplayController, because it has some of the interesting behavior. So what the EvoSuite tests are is a sequence of method calls and an expectation as to what's going to happen. So one of the things that we can do in the DisplayController is set the timeToCook. And if we set it to 6000, the expectation that EvoSuite has is that this will fail. because it's expecting an IllegalArgumentException here. And if it catches an IllegalArgumentException, then it actually marks the test as correct if it's thrown by the DisplayController. So then there's a whole bunch of other tests. Tests that should succeed with the DisplayController, and tests that should fail with the DisplayController, a number of different things. And what these are designed to do is to exercise the structure, fairly completely, of the DisplayController. And to look for behavior that differs from the behavior that was displayed by the DisplayController when the tests were generated. So we can run these tests. I'll just run them with JUnit here. You can see this runs, and there's 40 tests. And they all pass, which is not surprising. But let's see now, let's suppose that I wanted to change the setTimeToCook(). Let's say I was refactoring it and I, for some reason, got rid of the expectation on time here. The "time" is greater than 0. So I make some change to the code, and I rerun the test. And what's interesting is, in this case, it didn't catch it. So the EvoSuite tests aren't guaranteed to catch all the errors that we have. But if I change the other part of the argument, if I change time so that it's greater than 7000 rather than 6000. Then when I run the tests, then we can see a failure. And what happens is, if I look at that test, there is an assignment of 6007. Which would fail on the old system and passes in the new one. So these are the kinds of things that happen when I make changes to the DisplayController. And these tests and the DisplayController or Microwave will be downloadable so that you can play with this yourself. But the idea is that although EvoSuite isn't perfect. For many of the changes you're going to make to the code, you can automatically get a regression failure if you don't have functional equivalents between the original code and the new code. So as we saw with the first test, there is some contravening evidence. So there was a recent paper in 2017 that looked at Extract Method refactorings. And what they found is that about half the time, the EvoSuite-generated tests would catch the error. And the other half of the time, it would fail to catch the error. And sometimes the failure was due to the test erroring out because the structure of the types had changed significantly. But other times, it was like we saw in the first case. Where we removed a condition, and the test was not sufficiently strong to catch the error. So to recap, regression testing is to try and check that the system still behaves after updates the same way as it did before. With automation, we can get rigorous testing with a rigorous oracle for free using a previous version of the program. Or we can use a weaker oracle, something like EvoSuite. And then we don't require having to have the old program there. And although EvoSuite is able to catch a good number of the faults, at least in my experience, it's not perfect. And sometimes it fails to catch errors that you’ve introduced during regression. So it’s more of a sanity check than a replacement for having other tests. And it’s still an open area of research.