Friday, February 06, 2015

Fighting complexity through functional composition, part 1: how to implement functional composition


The absolute enemy in software (and other things as well) is complexity. Considering complexity as the opposite of simple, it makes our systems hard to understand, hard to debug and hard to extend or adapt. There is one very good talk of the great Rich Hickey, about this called: Simplicity Matters. The video: https://www.youtube.com/watch?v=rI8tNMsozo0. So let;s see how we can fight complexity in a practical example by using functional composition

The problem 



Considering that these days most of the integration is done through web services and JSON, we'll try to illustrate the complexity problem using an example from this area.

Requirement: in the json, we need to have a key “measurement”, that is mandatory, cannot be null, needs to be a string and cannot be empty string.

We’ll do a little bit of TDD here, starting with a test:

Python:
 class TestValidations(unittest.TestCase):  
   
   def validate_pair(self,json, is_valid, number_of_errors):  
     errors = validate_simplest_json(json)  
     print "",is_valid, errors, json  
     self.assertEquals(len(errors)==0,is_valid)  
     self.assertEquals(len(errors),number_of_errors)  
   
   def test_json_validation(self):  
     self.validate_pair({},False,1)  
   
   
 def validate_simplest_json(json):  
   errors = []  
   return errors  



All fail, which is great. Now let’s write the code, to check if the key is there:

 def validate_simplest_json(json):  
   errors = []  
   if not json.has_key("measurement"):  
     errors.append("measurement cannot be missing”)  
   return errors    

Pass. Now what about null? The test extends:

   def test_json_validation(self):  
     self.validate_pair({},False,1)  
     self.validate_pair({"measurement":None},False,1)  

the code to pass:

 def validate_simplest_json(json):  
   errors = []  
   if not json.has_key("measurement"):  
     errors.append("measurement cannot be missing")  
   else:  
     if json["measurement"]==None:  
       errors.append("measurement cannot be null”)  
   return errors  

Pass. Now let’s check if it is a string (or unicode):

   def test_json_validation(self):  
     self.validate_pair({},False,1)  
     self.validate_pair({"measurement":None},False,1)  
     self.validate_pair({"measurement":-1},False,1)  
     self.validate_pair({"measurement":{}},False,1)  
     self.validate_pair({"measurement":False},False,1)  
     self.validate_pair({"measurement":"abc"},True,0)  
     self.validate_pair({"measurement":u"Citroën"},True,0)  
   
 def validate_simplest_json(json):  
   errors = []  
   if not json.has_key("measurement"):  
     errors.append("measurement cannot be missing")  
   else:  
     if json["measurement"]==None:  
       errors.append("measurement cannot be null")  
     else:  
       if not isinstance(json["measurement"], str) and not isinstance(json["measurement"], unicode):  
         errors.append("measurement needs to string or unicode")  
   return errors  

Now we also need to check if it is not emty string:

   def test_json_validation(self):  
     self.validate_pair({},False,1)  
     self.validate_pair({"measurement":None},False,1)  
     self.validate_pair({"measurement":-1},False,1)  
     self.validate_pair({"measurement":{}},False,1)  
     self.validate_pair({"measurement":False},False,1)  
     self.validate_pair({"measurement":"abc"},True,0)  
     self.validate_pair({"measurement":u"Citroën"},True,0)  
     self.validate_pair({"measurement":""},False,1)  
   
 def validate_simplest_json(json):  
   errors = []  
   if not json.has_key("measurement"):  
     errors.append("measurement cannot be missing")  
   else:  
     if json["measurement"]==None:  
       errors.append("measurement cannot be null")  
     else:  
       if not isinstance(json["measurement"], str) and not isinstance(json["measurement"], unicode):  
         errors.append("measurement needs to string or unicode")  
       else:  
         if len(json["measurement"].strip())==0:  
           errors.append("measurement cannot be an empty string")  
   return errors  

All of the sudden, we hear we also need to make sure the length of the string is between 3 and 8 characters, and cannot be some reserved words like “password” or “archived"

Ok, so let’s code, expanding our tests:

   def test_json_validation(self):  
     self.validate_pair({},False,1)  
     self.validate_pair({"measurement":None},False,1)  
     self.validate_pair({"measurement":-1},False,1)  
     self.validate_pair({"measurement":{}},False,1)  
     self.validate_pair({"measurement":False},False,1)  
     self.validate_pair({"measurement":"abc"},True,0)  
     self.validate_pair({"measurement":u"Citroën"},True,0)  
     self.validate_pair({"measurement":""},False,1)  
     self.validate_pair({"measurement":"a"},False,1)  
     self.validate_pair({"measurement":"abcdefghijklmnefghij"},False,1)  
     self.validate_pair({"measurement":"password"},False,1)  
     self.validate_pair({"measurement":"archived"},False,1)  
     self.validate_pair({"measurement":"arCHived"},False,1)  

Then gradually we start coding the validation, arriving to:

 def validate_simplest_json(json):  
   errors = []  
   if not json.has_key("measurement"):  
     errors.append("measurement cannot be missing")  
   else:  
     if json["measurement"]==None:  
       errors.append("measurement cannot be null")  
     else:  
       if not isinstance(json["measurement"], str) and not isinstance(json["measurement"], unicode):  
         errors.append("measurement needs to string or unicode")  
       else:  
         lenm=len(json["measurement"].strip())  
         if lenm==0:  
           errors.append("measurement cannot be an empty string")  
         else:  
           if lenm<3: data-blogger-escaped-div="">  
             errors.append("measurement needs at least 3 characters")  
           elif lenm&gt;10:  
             errors.append("measurement needs at most 10 characters")  
           elif json["measurement"].strip().lower() in ["archived","password"]:  
             errors.append("measurement has a value which is not allowed")  
   return errors  

As requirements are added complexity grows. Now of course this code could be refactored, but eliminating the essential problem of complexity is very hard. Just imagine what will happen if at version 1.2 the customer will change the API and only allow the values to be a measurement unit like “0.12mm” or “13.2mg”. It will grow again and become more complex. Not pretty!

And having json with only one key is kind of rare… Usually the number of keys is a lot higher and of course the code a lot bigger. Bigger and more complex = disaster. In terms of code quality it will fail at being able to extend it easily and it will fail at being able to debug it easily.


The solution: implementing functional composition



Removing complexity can mean, more linear code, so let’s refactor it to be more linear:

 def validate_simplest_json_imperative_linear(json):  
   errors = []  
   should_exit=False  
   key = "measurement"  
   if not key_exists(json,key):  
    errors.append("{0} cannot be missing".format(key))  
    should_exit=True  
   
   if not should_exit:  
     if value_null(json, key):  
       errors.append("{0} cannot be null".format(key))  
       should_exit=True  
   
   if not should_exit:  
     if not is_string_or_unicode(json, key):  
       errors.append("{0} needs to string or unicode".format(key))  
       should_exit=True  
   
   if not should_exit:  
     if is_empty_string(json, key):  
       errors.append("{0} cannot be an empty string".format(key))  
       should_exit=True  
   
   
   if not should_exit:  
     lenm=len(json[key].strip())  
     if lenm<3: data-blogger-escaped-div="">  
       errors.append("{0} needs at least 3 characters".format(key))  
       should_exit=True  
     elif lenm&gt;10:  
       errors.append("{0} needs at most 10 characters".format(key))  
       should_exit=True  
   
   if not should_exit:  
     if json[key].strip().lower() in ["archived","password"]:  
       errors.append("{0} has a value which is not allowed".format(key))  
       should_exit=True  
   
   return errors  

And yes, all the tests still pass. But we’re far from over, although we do see a pattern by which each method is executed after the other… Hmm, now I’ll move all variables like json, key, errors and exit into a single object (a tuple) so that we don’t pass 4 parameters back and forth:

 ValidationState = namedtuple("ValidationState","json key errors exit”)  

then I will extract the actual validations in simple functions, like:

 def validate_simplest_json_imperative_linear_with_state(json):  
   initial_state = ValidationState(json=json, key="measurement",errors=[], exit=False)  
   
   state = validate_key_exists(initial_state)  
   
   if not state.exit:  
     state = validate_not_null(state)  
   
   if not state.exit:  
     state = validate_string_or_unicode(state)  
   
   if not state.exit:  
     state = validate_not_empty_string(state)  
   
   if not state.exit:  
     state = validate_length(state, 3,10)  
   
   if not state.exit:  
     state = validate_not_in(state, ["archived","password"])  
   
   return state.errors  

And the functions:

 def validate_key_exists(state):  
   print validate_key_exists.__name__,state  
   if not key_exists(state.json,state.key):  
     return state._replace(errors = state.errors+["{0} cannot be missing".format(state.key)])._replace(exit=True)  
   return state  
   
 def validate_not_null(state):  
   print validate_not_null.__name__,state  
   if value_null(state.json,state.key):  
     return state._replace(errors = state.errors+["{0} cannot be null".format(state.key)])._replace(exit=True)  
   return state  
   
 def validate_string_or_unicode(state):  
   print validate_string_or_unicode.__name__,state  
   if not is_string_or_unicode(state.json,state.key):  
     return state._replace(errors = state.errors+["{0} needs to string or unicode".format(state.key)])._replace(exit=True)  
   return state  
   
 def validate_not_empty_string(state):  
   print validate_not_empty_string.__name__,state  
   if is_empty_string(state.json,state.key):  
     return state._replace(errors = state.errors+["{0} cannot be an empty string".format(state.key)])._replace(exit=True)  
   return state  
   
 def validate_length(state, min, max):  
   print validate_length.__name__,state  
   lenm=len(state.json[state.key].strip())  
   if lenm  
     return state._replace(errors = state.errors+["{0} needs at least 3 characters".format(state.key)])._replace(exit=True)  
   elif lenm&gt;max:  
     return state._replace(errors = state.errors+["{0} needs at most 10 characters".format(state.key)])._replace(exit=True)  
   return state  
   
 def validate_not_in(state,vals):  
   print validate_not_in.__name__,state  
   if state.json[state.key].strip().lower() in vals:  
     return state._replace(errors = state.errors+["{0} has a value which is not allowed".format(state.key)])._replace(exit=True)  
   return state  
   

The code looks is now a series of functions that run with the result of the previous function if the exit parameter is not set to True. So basically having 2 functions f,g they’ll be composed like:

initial_state = …
state = f(initial_state)
if not state.exit:
    return g(state)

And putting this in a function:

 def compose2(f, g):  
   def run(x):  
     result_f = f(x)  
     if not result_f.exit:  
       return g(result_f)  
     else:  
       return result_f  
   return run  

Using this we can now compose 2 functions into one:

 def validate_simplest_json_imperative_linear_with_state(json):  
   initial_state = ValidationState(json=json, key="measurement",errors=[], exit=False)  
   
   # state = validate_key_exists(initial_state)  
   #  
   # if not state.exit:  
   #   state = validate_not_null(state)  
   
   composed_function = compose2(validate_key_exists, validate_not_null)  
   state = composed_function(initial_state)  


But we don’t have only 2 function, we have more, so we write a reduce:

 #compose n functions  
 def compose(*functions):  
   return reduce(compose2, functions)  

and out function becomes:

 def validate_simplest_json_imperative_linear_with_state(json):  
   initial_state = ValidationState(json=json, key="measurement",errors=[], exit=False)  
   
   composed_function = compose(validate_key_exists, validate_not_null,validate_string_or_unicode,validate_not_empty_string)  
   state = composed_function(initial_state)  
   
   if not state.exit:  
     state = validate_length(state, 3,10)  
   
   if not state.exit:  
     state = validate_not_in(state, ["archived","password"])  
   
   return state.errors  

but we just hit a problem. He have some functions that have more parameters and we need to pass them. We’ll use closures:

 def create_validate_length(min, max):  
   def validate_length(state):  
     print validate_length.__name__,state  
     lenm=len(state.json[state.key].strip())  
     if lenm  
       return state._replace(errors = state.errors+["{0} needs at least 3 characters".format(state.key)])._replace(exit=True)  
     elif lenm&gt;max:  
       return state._replace(errors = state.errors+["{0} needs at most 10 characters".format(state.key)])._replace(exit=True)  
     return state  
   return validate_length  
   
 def create_validate_not_in(vals):  
   def validate_not_in(state):  
     print validate_not_in.__name__,state  
     if state.json[state.key].strip().lower() in vals:  
       return state._replace(errors = state.errors+["{0} has a value which is not allowed".format(state.key)])._replace(exit=True)  
     return state  
   return validate_not_in  

And now the final validation code:

 def validate_simplest_functional_composition(json):  
   initial_state = ValidationState(json=json, key="measurement",errors=[], exit=False)  
   
   composed_function = compose(validate_key_exists, validate_not_null,validate_string_or_unicode,validate_not_empty_string, create_validate_length(3, 10), create_validate_not_in(["archived","password"]))  
   final_state = composed_function(initial_state)  
   
   return final_state.errors  

It is much better. It basically says: having an initial start of the system, run all these functions (validators) and at the end get a final state.


a preview:


Or in Javascript: http://jsfiddle.net/danbunea1/gz87dt5a/





Conclusion: Why is this better?



Now you would think, how can a solution with ... lines of code be better then one with just 21. In part 2 of the article, called "Why is functional composition better" I will illustrate why, and how functional composition makes our code simpler, easier to understand, debug and change.


No comments: