Monday, June 19, 2006

To obtain good code, writing tests and code is faster then code alone

A few weeks ago on the TestDrivenDevelopment mailing list, Ron Jeffies, one of the XP gurus stated that "in order to obtain good code, writing tests and code is faster then just code". To find out if this is true or not let's make a small experiment.

The mini TDD experiment

We assume that we are programmers and we need to code a function that divides two positive numbers. For this experiment we will compare the traditional and the TDD approaches.

Approach #1. Code and fix

As programmers, for a simple division we will write the following “pseudo” code:

Function Divide(No1, No2)

Return No1/No2

For this very simple method, let’s assume we needed 5 seconds to write it. Now let’s test if it works. First we try 6 and 2, expecting 3. It works. Let’s try another combination: 1 and 2, expecting 0.5. It works. Now let’s try 8 and 0. An error just occurred. This means we need to modify the program to display a message to the user that the second number cannot be 0:

Function Divide(No1,No2)

If No2 = 0 then display message “Division by 0 cannot be performed”

Else Return No1/No2

Now let’s test our function again. 6 and 2, result 3, good, 1 and 2, 0.5 as expected, 8 and 0 and a message “Division by 0 cannot be performed” occurs as expected. Now, our program works fine.

Assuming that manual testing is slow and for each combination of numbers we need about 10 seconds, this means that a testing session takes 30 seconds. The total time in which we developed the code was: 5 seconds to write the function, 30 seconds to test and see it has problems with division by 0, then about another 5 to correct the function and 30 minutes to test it again and make sure it works: total 5+30+5+30=70 seconds, a minute and 10 seconds.

Approach #2: Test Driven development

In test driven development, there are a series of steps to write a piece of code, starting with and automated test written first and ending up making that test succeed, by writing the code that it tests. Let’s see how it goes:

Function TestNormalDivision()

Expect 3 as a result of Divide(6,2)

The code above compares the value expected and the value returned by our (yet unwritten code) and if they do not match it fails.

One very important step now is to make sure our test really tests something and it does not work every time, no matter what the code under test does. So for this we need to make sure that when it needs to fail, it fails. So we write the following function:

Function Divide(No1, No2)

Return 0

Now we run the test, and it fails saying: expected 3 but the result was 0. So now we modify the function to return pass the test.

Function Divide(No1, No2)

Return 3

Now we run the test again: 1 test succeeded. Excellent. Now let’s see if it works for 1 and 2, so we update the test:

Function TestNormalDivision()

Expect 3 as a result of Divide(6,2)

Expect 0.5 as a result of Divide(1,2)

We run the test. Failure. Oooh, we just realize the mistake we made (code always returns 3) and modify the Divide function:

Function Divide(No1, No2)

Return No1/No2

Running the test, now passes all our expectations. But now we think, what would happen if we used 8 and 0. Let’s add a new test to the test suite (now we have two) and make sure that if there is division by 0, the user is notified:

Function TestDivisionByZero()

Expect message “Division by 0 cannot be performed” displayed as a result of Divide(8,0)

We run the test. It fails. Now we modify our function to make it work:

Function Divide(No1,No2)

If No2 = 0 then display message “Division by 0 cannot be performed”

Else Return No1/No2

Running all our tests, we discover that they all succeed.

How much time did we need to write this code? We needed 5 seconds to write the first test, 5 seconds to make sure it fails, 1 second to run the test (now testing is done by the computer so we assume it should be at least 10 times faster then manual testing), 5 seconds to modify the code to make the test work, 1 second to run the test, another 5 seconds to extend the test to verify the 1,2 combination, 1 second to see that the test fails, 5 seconds to modify the function and 1 second to see it working, another 5 seconds to write the second test and 2 seconds to see the first test work but the second failing, and 5 seconds to complete the code and another two to run the 2 tests and make sure it works. Wow, a long way: 5+5+1+5+1+5+1+5+1+5+2+5+2 = 42 seconds.

Using both approaches, we ended up with the same code. The amount of code written for the second approach is bigger then for the first, having the code and the tests. The amount of time needed for the second approach was arguably smaller then the amount for the first approach, which leads us to Ron Jeffries’s conclusion: to obtain good code, writing tests and code is faster then code alone. The main advantage is that we use computer power to do the testing rather then human power, so we are much faster. Then we can run the automated tests over and over again and it will take 2 seconds to see if they work, manually it will take 30 to do the same thing.

Let’s go further with our experiment, assuming that now we need to extend the program to be able also to do addictions, subtractions and multiplications.

Approach #1. Code&Fix

Since all these operation are not affected by 0, but we test that anyway, the code written first will work, so it would take about 5 seconds to write each method, and testing each with 3 combinations of numbers would result in about 30 seconds to test each. The amount of time, needed would be 5+30+5+30+5+30 = 105 seconds, 1 minute and 45 seconds. Testing the whole program (the 3 new methods and the division method) would take us 4*30 = 120 seconds, which is 2 minutes.

Approach #2. TDD

Operations just as above will need only one test, checking 3 combinations. Let’s say it takes 10 seconds to write a test method like this:

Function TestMultiplication()

Expect 0 as a result of Multiply(6,0)

Expect 3 as a result of Multiply(3,1)

Expect -9 as a result of Multiply(3,-3)

Then we’d have to make sure it fails: 5 seconds, 1 second to run the test, then we’d write the code to make it work: 5 seconds and 1 second to make sure it works, so it takes about 10+5+1+5+1 = 22 seconds for each new function, resulting in 3*22 = 66 seconds or 1 minute and 6 seconds to write the new functions. Testing all the code would mean running 5 test methods (2 for division and 1 for the other three), which would be run in 5 seconds.

Tests and code, faster then just code

Comparing the times needed to test our incredibly simple system: 2 minutes vs. 5 seconds show us that not only the code is written faster (110+105=215 seconds vs 66+45=111 seconds), but making sure it works requires far less time for the TDD approach. And second big advantage, it can be done by a computer.

Using a continuous integration machine that downloads the program sources and runs the test suite, then sends us an email telling us what happened, means 5 seconds for the machine and 0 seconds on my side to test the whole system. Using the first approach, would take me 2 minutes to make sure the whole system works. I could delegate this responsibility of testing the whole system to the testing team, but the feedback times, telling me whether the system works as a whole or not, increase rapidly to days and weeks and by that time I should be doing something else.


In the 3rd phase of our little experiment, we analyze what would happen if our system would have 400 functions instead of 4. Using the first approach it would need about 12000 seconds (that is over 3 hours) for a full test, while using the TDD and automated testing suite about 500 seconds or, better said, less then 10 minutes. This simple sample shows us scalability when it comes to TDD vs traditional coding approaches. The testing team could work, to some extent in parallel, but after, all I could set my integration machine to divide the tests and work in parallel.

After making this very small experiment, we showed how test driven development is, compared with just coding:

o faster to develop
o faster to test the whole system and give feedback
o scalable

Tests as documentation

Another advantage of the method described above is that, the automated tests can act as a very good documentation of the code written. In traditional approaches, just documenting things that can be very easily deduced from the automated tests, like how a function works, would increase even more the development time. After all, just reading:

Function TestDivisionByZero()

Expect message “Division by 0 cannot be performed” displayed as a result of Divide(8,0)

tells me or someone new in the project, that if you try to perform a division by 0, the system will display an error message on the screen.

Embrace change: how?

Having a system with 4, 400, 1000…100.000 methods, doesn’t really comfort me when it comes to making a change in it. If I change one tiny piece of code somewhere, could I break something in another part of the system? And if I do, how could I know fast enough, to be able to either correct it or reverse my changes?

To have the feedback from the code, telling me if and where I’ve broken some existing functionality, I would normally need to retest the whole system. For a 4 methods system, it would take 2 minutes, but for a more likely system, it might take hours, days or even weeks. So the courage to change decreases with the system getting bigger, thus is shortening the life of the system. When a system is too rigid and can no longer adapt to the changes on the market, it is bound to die.

Having a full regression automated test suite, that runs very fast and can be run very often, means fast feedback. Fast feedback means changes are less risky and can be done easier and faster, thus extending the life of a system.

Design advantages

Another advantage of writing automated tests for the code is that the code written tends to be very loosely coupled, thus better designed. Test driven development also tends to eliminate “partially completed code”, encouraging less code to be written, as the programmer is more focused on what is really needed, thus decreasing the amount of code and its complexity.

At a macro level, the fact that changes, even in the architecture are much easier to be performed, when using TDD combined with aggressive refactoring, allows the programmers to continuously upgrade the design and update the architecture. Since the changes are easy to do, the evolutionary design technique is encouraged, having a much smaller need to build a flexible architecture upfront, following the YAGNI principle from XP.


Stuart said...

While I'm a big fan of TDD, I'm not sure you're measurements of time for this function are entirely scientific!

For example, why would it take me 30 seconds to test such a simple function? Plus I'd need to test it using a program (e.g. an interpreter) to call the method... hence this is running on the computer too... so why are the unit tests 'at least 10 times faster'. And when did 30 seconds divided by 10 equal 1 second?

I agree with what you're saying, but I think you should watch out for trying to 'prove' something using made up data. This will always be thrown back in your face.

Dan Bunea said...

Hi Stuart,

I do agree that my experiment isn't scientific, and that my measurements are more or less ok. Since I only keep it as a blog entry, and not as a scrientific experiment, I do want the community , both those familiar and those not familiar with TDD, to tell me what needs to be adapted, redone etc and for that I have to thank you for your observations which are very good.

I took 30 seconds for the simple function, because I said you would need more pairs of values to be inputed manually and the result checked (6 and 2, 1 and 2 and 8 and 0 which fails etc). For this particular function the time depends for instance whether the program needs to be restarted to check a new pair of values or not etc.

You are right that such things can backfire. What do you think the right times would be?