Tuesday, May 03, 2005

Agile databases - embrace change

Ever changing databases

In a world where we have to adopt new processes to develop software in order to adapt to ever changing environements, what happens to our databases. Do our databases, keep up the rhytm with the code in changing conditions?

Although we made big progresses in 'becoming agile' we are still at the beginning although instant results were seen ever since we started. But although there are lots of materials about agile processes, about agile techniques that are focused on making agile processes work like unit testing and TDD, refactoring, continuous integration, frequent delivery, evolutionaty design etc there are few materials over the web that deal with databases in agile processes.

I would like to say that mostly we are confronting ourselves with simple databases (<=50 tables).

The questions I have are:

1. How do databases deal with changes?
2. How to develop tests that are easely maintenable in these conditions?

1. How do databases deal with changes?

After adopting a two week iteration technique, we were confrunted with many versions of a database, one every two weeks. The database was evolving and often changed. The problem is bigger if the system has gone live. You must make an update to the live database with every new version. In the first iterations of a project, usually until a 'stable' database schema is adopted, it changes a lot, but if the system doesn't go live this is a minor problem as old databases can be just dropped and rewritten. After the stable version is reached, the updates from one version to another can be done more easely with alter scripts.

From time to time the client told us they were having difficulties with the new version we just sent, as it just crashes immediately after start. We noticed that this is usually because the database scripts haven't been run and the program is the new version, but the database is old and it the application just crashes at some time. This has given us a lot of sweats and afterall,clients that came to see the new version and see it crashing remember that your software is buggy not that this has something nothing to do with you.

Versions table and update scripts

A way to deal with this was the versions table and update scripts. Each modification in the database from one version to another version is made with a script and not directly. The database also contains a Versions table the logs all these updates. This way we can know what is the current database version, when was it created , what modifications were made (a comments field) who did the script and when it was actually run. How do we know this? By simply putting an insert line into our update script that puts in this data into the versions table.

After applying this techinque we thought about putting in the application installer, a part of code that runs the update. And we did it, but new problems arrived. A client managed somehow to override a live database, although we put in a lot of messageboxes asking to confirm whatever he did. So we thought of adding a line in the configuration file of the application which says which version should the database have in order to start the program and unless the database has the right version, the program won't start. Now noone forgets to make the updates.

Automating the update script creation

This is a simple solution that seems to work well. However a problem with it is that manually creating the update scripts is taking a lot of time to do and to test. This problem brought to our attention a product, build by RedGate, that knows to generate the update script after comparing two databases. This greatly reduces the amount of work, needed to make the update scripts, we can now make them manually on a test database then with the tool generate the update script.

2. How to develop tests that are easely maintenable in these conditions?

Everyone teaches us to make our database related tests, data independent. The most used techniques are made either by having a separate database used for testing , creating the test database with data on setup and dropping it on teardown and using a rollback technique (database server rollback or using Entreprise Services). Each of these techniques have their own problems, although seem to solve the problem of making the database tests independent from the data in the database. But are they easely maintenable if the database stucture changes often?

ORMs

It seems that using ORMs makes database access code more testable and easely maintenable, as errors after a change was made is detected fast. Some ORMs even know how to generate the create, and most importantly the update scripts themselves. If you have the unit tests related to the ORM's persistent objects the errors in tests can be detected very fast, first just by compiling then by running them so that query problems are revealed. ORMs seem in my opinion more agile then other approaches as using ADO.NET connected and disconnected database access. Is this the solution for all problems? No, however they seem to embrace change more easely.

Regression query integrity tests

Whether you are using ORMs or not, when making a structure change to a database you need instant feedback to see if you broke anything. It is very important to know that at least,that all sql queries (queries/stored procedures etc) work. Not that they work ok, but that they work. It is very important to know this immediately after making the changes. So maybe one technique is to make a test suite that just verifies validity of sql queries and not functionality. I call these regression query integrity tests.

My idea is to create a list of very simple tests, that just run the queries, to see if they break or not. This can be easily run each time, a structural change in the database is done, to see if something was broken or not. Some of the code for this kind of tests can be generated. Code generation, is in my opinion the great missing piece from any agile metholodology's list of practices, as by using code generation, productivity can dramatically improve.

Conclusion

Although we develop projects that are quite simple, from a DBAs point of view, we must keep in mind that it is very important to have techniques that enable us to change the database code, just as easy as the code itself, otherwise this can rapidly become the big bottleneck of our project, that 'embrace chage'.

1 comment:

Dan Bunea said...

:) Yes, Scott W. Amber, in my opinion is the absolute reference in the domain . I am for some time a member of Yahoo Group, where he posts: http://groups.yahoo.com/group/agileDatabases/

I also encourage reading his articles about database refactorings and the latest SD Times issue: http://68.236.189.240/article/opinion-20050501-02.html


Also Martin Fowler has a great article, about this, called Evolutionary Database Design at: http://martinfowler.com/articles/evodb.html