Sunday, October 21, 2007

Lucene indexes as agile databases

Download Source code

Introduction

In product development, as opposed to bespoke project development where each client gets its own software, you deliver the same software to each client. This has some advantages: single development stream, with multiple sales but in reality each client will want to be able
to customize its own product. Each will divide its products differently, will have different fields and if your product is not flexible enough to allow these policies they will reject your product.

Relation databases are not agile enough

In an agile manner, when developing a software project, you plan for change. In Extreme Programming, they say a piece of code has a good internal quality if it can be easily maintained, changed and debugged. In product development, the data store changes from one version to another and this is often a big burden. Relational databases simply haven't been designed to be easy to change their structure. Once an ALTER deletes a column, the entire database becomes unstable, you have to check each select, update, insert, each statement to make sure it doesn't crash. Migration scripts are hard to create and very error prone.
There are many workarounds, but none of them is trivial work.

ActiveDocument: Indexes as truly dynamic databases

Indexes were designed, having in mind that they need to search large repositories, extremely fast. Look at Google. It indexes over a billion websites and it is FAST.

Could an index be better structured the just having large quantities if text documents, for instance having types and properties, relations between entities, pretty much what a common relational database would have, and beyond that a truly flexible infrastructure, where typescan be created, modified and destroyed on the fly, where their properies could be added, removed or their names changed withought fear? Something that in C# would be like:


As it can be easly seen we create a type (Product) on the fly, with no need to define a table or any kind of structure, and then we add properties to the instances of this type, which can differ. Structure is actually defined and changed dynamically at runtime. The two product instances above, both have a Name, but only one has a Category, and the query still works.

Ladies and gentelmen this technology is possible, and is called ActiveDocument, and the sources can be downloaded from here

Besides being able to store typed data, and retrieve it fast and easy, we can also relate data:



Conclusion

Is it a breaktrough? Will this kill the relation databases? Is this 100% safe? I have no idea. I am discovering the advantages and disadvantages as we go, and I am sure some others will certainly like to help me on this journey.

Pros and cons to using indexes instead of databases

-fast
-scalable
-they don't break
-...

Cons

-no joins
-no transactions
-no foreign keys
-...

Please feel free to add :)