Becoming Agile: Lucene indexes as agile databases

Sunday, October 21, 2007

Lucene indexes as agile databases

Download Source code

Introduction

In product development, as opposed to bespoke project development where each client gets its own software, you deliver the same software to each client. This has some advantages: single development stream, with multiple sales but in reality each client will want to be able
to customize its own product. Each will divide its products differently, will have different fields and if your product is not flexible enough to allow these policies they will reject your product.

Relation databases are not agile enough

In an agile manner, when developing a software project, you plan for change. In Extreme Programming, they say a piece of code has a good internal quality if it can be easily maintained, changed and debugged. In product development, the data store changes from one version to another and this is often a big burden. Relational databases simply haven't been designed to be easy to change their structure. Once an ALTER deletes a column, the entire database becomes unstable, you have to check each select, update, insert, each statement to make sure it doesn't crash. Migration scripts are hard to create and very error prone.
There are many workarounds, but none of them is trivial work.

ActiveDocument: Indexes as truly dynamic databases

Indexes were designed, having in mind that they need to search large repositories, extremely fast. Look at Google. It indexes over a billion websites and it is FAST.

Could an index be better structured the just having large quantities if text documents, for instance having types and properties, relations between entities, pretty much what a common relational database would have, and beyond that a truly flexible infrastructure, where typescan be created, modified and destroyed on the fly, where their properies could be added, removed or their names changed withought fear? Something that in C# would be like:

As it can be easly seen we create a type (Product) on the fly, with no need to define a table or any kind of structure, and then we add properties to the instances of this type, which can differ. Structure is actually defined and changed dynamically at runtime. The two product instances above, both have a Name, but only one has a Category, and the query still works.

Ladies and gentelmen this technology is possible, and is called ActiveDocument, and the sources can be downloaded from here

Besides being able to store typed data, and retrieve it fast and easy, we can also relate data:

Conclusion

Is it a breaktrough? Will this kill the relation databases? Is this 100% safe? I have no idea. I am discovering the advantages and disadvantages as we go, and I am sure some others will certainly like to help me on this journey.

Pros and cons to using indexes instead of databases

-fast
-scalable
-they don't break
-...

Cons

-no joins
-no transactions
-no foreign keys
-...

Please feel free to add :)

15 comments:

Unknown said...: I worked for a well known RSS company that eschewed databases for Lucene. It worked really well for document-centric persistence. I was an obstruction once we started doing relational data apps.

It points to a need to consider the right persistence model for the job rather than just assuming that relational databases are the best tool for any and every job.; 12:14 AM
Darius Damalakas said...: @sbellware,
What was the obstruction that you dealt with? Could you evolve more about this?

Experience is badly needed to make sound decisions ;); 7:26 AM
Unknown said...: Hi,

I've just had a quick preview from ActiveDocument and I don't clearly understand how you pretend to retrive data from database and then use it with ActiveDocument.

Regards,; 1:39 PM
Andrei Ignat said...: Does permit intellisense ?; 4:14 PM
Dan Bunea said...: @João Paulo Marques

It only uses the index to store/retrieve data. There is no database.

Please, look at the tests to see how it's used.

@andrei ignat
Not really, since everything is dynamic

Thanks,
Dan; 9:37 AM
Tati said...: I quote you "they don't break" , i think that missing transactions put your info in a "they will break" in Murphy's way.; 2:47 PM
Dan Bunea said...: There are scenarios in which transactions are useless, or they are not as usefull as one might think. Querying is where they don't break, and updating the index, can be made to work pretty much as in a transaction if is well kept in a try catch block.

I wouldn't suggest lucene for data write intensive scenarios, but rather for query intensive scenarios.; 4:24 PM
Bart said...: Hi Dan,

Just stumbled on your post on Lucene again. Seems like a cool way to provide extensibility on your domain model. I was just wondering how you are dealing with UI binding?; 6:44 PM
Dan Bunea said...: Hi Bart,

We haven't used AD with desktop apps or with ASP.NET so I cannot tell you how it would work. We've used it with Castle MonoRail and velocity view engine and there the bindings work ok, although different from ASP.NET binding style.

Thanks,
Dan; 11:22 AM
Bart said...: I wanted to test it out with Monorail as well. So, if you can give me a clue how you did that, that would be great.; 11:40 AM
Dan Bunea said...: So, if you have in code:

ad["Name"] to get the value of teh Name property in MR it is:

$ad.Name

Thanks,
Dan; 11:46 AM
Bart said...: I should have been more clear in my question. I know how I can display something like ad["Name"]. The problem with customized fields is of course that you don't know what is going to be in there. It could be ad["Name"], but it could also be ad["Category"] for another user, and that is something you don't know up front of course. That' where I am confused.; 12:43 PM
Dan Bunea said...: You can use ad.Properties to determine the properties.; 1:14 PM
Bart said...: Thanks a lot. Will give that a try over the weekend.; 1:16 PM
Dan Bunea said...: :) The weekend should be for other things, not work :); 1:32 PM