The absolute enemy in software (and other things as well) is complexity. Considering complexity as the opposite of simple, it makes our systems hard to understand, hard to debug and hard to extend or adapt. There is one very good talk of the great Rich Hickey, about this called: Simplicity Matters. The video: https://www.youtube.com/watch?v=rI8tNMsozo0. So let;s see how we can fight complexity in a practical example by using functional composition
The problem
Considering that these days most of the integration is done through web services and JSON, we'll try to illustrate the complexity problem using an example from this area.
Requirement: in the json, we need to have a key “measurement”, that is mandatory, cannot be null, needs to be a string and cannot be empty string.
We’ll do a little bit of TDD here, starting with a test:
Python:
class TestValidations(unittest.TestCase):
def validate_pair(self,json, is_valid, number_of_errors):
errors = validate_simplest_json(json)
print "",is_valid, errors, json
self.assertEquals(len(errors)==0,is_valid)
self.assertEquals(len(errors),number_of_errors)
def test_json_validation(self):
self.validate_pair({},False,1)
def validate_simplest_json(json):
errors = []
return errors
All fail, which is great. Now let’s write the code, to check if the key is there:
def validate_simplest_json(json):
errors = []
if not json.has_key("measurement"):
errors.append("measurement cannot be missing”)
return errors
Pass. Now what about null? The test extends:
def test_json_validation(self):
self.validate_pair({},False,1)
self.validate_pair({"measurement":None},False,1)
the code to pass:
def validate_simplest_json(json):
errors = []
if not json.has_key("measurement"):
errors.append("measurement cannot be missing")
else:
if json["measurement"]==None:
errors.append("measurement cannot be null”)
return errors
Pass. Now let’s check if it is a string (or unicode):
def test_json_validation(self):
self.validate_pair({},False,1)
self.validate_pair({"measurement":None},False,1)
self.validate_pair({"measurement":-1},False,1)
self.validate_pair({"measurement":{}},False,1)
self.validate_pair({"measurement":False},False,1)
self.validate_pair({"measurement":"abc"},True,0)
self.validate_pair({"measurement":u"Citroën"},True,0)
def validate_simplest_json(json):
errors = []
if not json.has_key("measurement"):
errors.append("measurement cannot be missing")
else:
if json["measurement"]==None:
errors.append("measurement cannot be null")
else:
if not isinstance(json["measurement"], str) and not isinstance(json["measurement"], unicode):
errors.append("measurement needs to string or unicode")
return errors
Now we also need to check if it is not emty string:
def test_json_validation(self):
self.validate_pair({},False,1)
self.validate_pair({"measurement":None},False,1)
self.validate_pair({"measurement":-1},False,1)
self.validate_pair({"measurement":{}},False,1)
self.validate_pair({"measurement":False},False,1)
self.validate_pair({"measurement":"abc"},True,0)
self.validate_pair({"measurement":u"Citroën"},True,0)
self.validate_pair({"measurement":""},False,1)
def validate_simplest_json(json):
errors = []
if not json.has_key("measurement"):
errors.append("measurement cannot be missing")
else:
if json["measurement"]==None:
errors.append("measurement cannot be null")
else:
if not isinstance(json["measurement"], str) and not isinstance(json["measurement"], unicode):
errors.append("measurement needs to string or unicode")
else:
if len(json["measurement"].strip())==0:
errors.append("measurement cannot be an empty string")
return errors
All of the sudden, we hear we also need to make sure the length of the string is between 3 and 8 characters, and cannot be some reserved words like “password” or “archived"
Ok, so let’s code, expanding our tests:
def test_json_validation(self):
self.validate_pair({},False,1)
self.validate_pair({"measurement":None},False,1)
self.validate_pair({"measurement":-1},False,1)
self.validate_pair({"measurement":{}},False,1)
self.validate_pair({"measurement":False},False,1)
self.validate_pair({"measurement":"abc"},True,0)
self.validate_pair({"measurement":u"Citroën"},True,0)
self.validate_pair({"measurement":""},False,1)
self.validate_pair({"measurement":"a"},False,1)
self.validate_pair({"measurement":"abcdefghijklmnefghij"},False,1)
self.validate_pair({"measurement":"password"},False,1)
self.validate_pair({"measurement":"archived"},False,1)
self.validate_pair({"measurement":"arCHived"},False,1)
Then gradually we start coding the validation, arriving to:
def validate_simplest_json(json):
errors = []
if not json.has_key("measurement"):
errors.append("measurement cannot be missing")
else:
if json["measurement"]==None:
errors.append("measurement cannot be null")
else:
if not isinstance(json["measurement"], str) and not isinstance(json["measurement"], unicode):
errors.append("measurement needs to string or unicode")
else:
lenm=len(json["measurement"].strip())
if lenm==0:
errors.append("measurement cannot be an empty string")
else:
if lenm<3: data-blogger-escaped-div="">
errors.append("measurement needs at least 3 characters")
elif lenm>10:
errors.append("measurement needs at most 10 characters")
elif json["measurement"].strip().lower() in ["archived","password"]:
errors.append("measurement has a value which is not allowed")
return errors
As requirements are added complexity grows. Now of course this code could be refactored, but eliminating the essential problem of complexity is very hard. Just imagine what will happen if at version 1.2 the customer will change the API and only allow the values to be a measurement unit like “0.12mm” or “13.2mg”. It will grow again and become more complex. Not pretty!
And having json with only one key is kind of rare… Usually the number of keys is a lot higher and of course the code a lot bigger. Bigger and more complex = disaster. In terms of code quality it will fail at being able to extend it easily and it will fail at being able to debug it easily.
The solution: implementing functional composition
Removing complexity can mean, more linear code, so let’s refactor it to be more linear:
def validate_simplest_json_imperative_linear(json):
errors = []
should_exit=False
key = "measurement"
if not key_exists(json,key):
errors.append("{0} cannot be missing".format(key))
should_exit=True
if not should_exit:
if value_null(json, key):
errors.append("{0} cannot be null".format(key))
should_exit=True
if not should_exit:
if not is_string_or_unicode(json, key):
errors.append("{0} needs to string or unicode".format(key))
should_exit=True
if not should_exit:
if is_empty_string(json, key):
errors.append("{0} cannot be an empty string".format(key))
should_exit=True
if not should_exit:
lenm=len(json[key].strip())
if lenm<3: data-blogger-escaped-div="">
errors.append("{0} needs at least 3 characters".format(key))
should_exit=True
elif lenm>10:
errors.append("{0} needs at most 10 characters".format(key))
should_exit=True
if not should_exit:
if json[key].strip().lower() in ["archived","password"]:
errors.append("{0} has a value which is not allowed".format(key))
should_exit=True
return errors
And yes, all the tests still pass. But we’re far from over, although we do see a pattern by which each method is executed after the other… Hmm, now I’ll move all variables like json, key, errors and exit into a single object (a tuple) so that we don’t pass 4 parameters back and forth:
ValidationState = namedtuple("ValidationState","json key errors exit”)
then I will extract the actual validations in simple functions, like:
def validate_simplest_json_imperative_linear_with_state(json):
initial_state = ValidationState(json=json, key="measurement",errors=[], exit=False)
state = validate_key_exists(initial_state)
if not state.exit:
state = validate_not_null(state)
if not state.exit:
state = validate_string_or_unicode(state)
if not state.exit:
state = validate_not_empty_string(state)
if not state.exit:
state = validate_length(state, 3,10)
if not state.exit:
state = validate_not_in(state, ["archived","password"])
return state.errors
And the functions:
def validate_key_exists(state):
print validate_key_exists.__name__,state
if not key_exists(state.json,state.key):
return state._replace(errors = state.errors+["{0} cannot be missing".format(state.key)])._replace(exit=True)
return state
def validate_not_null(state):
print validate_not_null.__name__,state
if value_null(state.json,state.key):
return state._replace(errors = state.errors+["{0} cannot be null".format(state.key)])._replace(exit=True)
return state
def validate_string_or_unicode(state):
print validate_string_or_unicode.__name__,state
if not is_string_or_unicode(state.json,state.key):
return state._replace(errors = state.errors+["{0} needs to string or unicode".format(state.key)])._replace(exit=True)
return state
def validate_not_empty_string(state):
print validate_not_empty_string.__name__,state
if is_empty_string(state.json,state.key):
return state._replace(errors = state.errors+["{0} cannot be an empty string".format(state.key)])._replace(exit=True)
return state
def validate_length(state, min, max):
print validate_length.__name__,state
lenm=len(state.json[state.key].strip())
if lenm
return state._replace(errors = state.errors+["{0} needs at least 3 characters".format(state.key)])._replace(exit=True)
elif lenm>max:
return state._replace(errors = state.errors+["{0} needs at most 10 characters".format(state.key)])._replace(exit=True)
return state
def validate_not_in(state,vals):
print validate_not_in.__name__,state
if state.json[state.key].strip().lower() in vals:
return state._replace(errors = state.errors+["{0} has a value which is not allowed".format(state.key)])._replace(exit=True)
return state
The code looks is now a series of functions that run with the result of the previous function if the exit parameter is not set to True. So basically having 2 functions f,g they’ll be composed like:
initial_state = …
state = f(initial_state)
if not state.exit:
return g(state)
And putting this in a function:
def compose2(f, g):
def run(x):
result_f = f(x)
if not result_f.exit:
return g(result_f)
else:
return result_f
return run
Using this we can now compose 2 functions into one:
def validate_simplest_json_imperative_linear_with_state(json):
initial_state = ValidationState(json=json, key="measurement",errors=[], exit=False)
# state = validate_key_exists(initial_state)
#
# if not state.exit:
# state = validate_not_null(state)
composed_function = compose2(validate_key_exists, validate_not_null)
state = composed_function(initial_state)
But we don’t have only 2 function, we have more, so we write a reduce:
#compose n functions
def compose(*functions):
return reduce(compose2, functions)
and out function becomes:
def validate_simplest_json_imperative_linear_with_state(json):
initial_state = ValidationState(json=json, key="measurement",errors=[], exit=False)
composed_function = compose(validate_key_exists, validate_not_null,validate_string_or_unicode,validate_not_empty_string)
state = composed_function(initial_state)
if not state.exit:
state = validate_length(state, 3,10)
if not state.exit:
state = validate_not_in(state, ["archived","password"])
return state.errors
but we just hit a problem. He have some functions that have more parameters and we need to pass them. We’ll use closures:
def create_validate_length(min, max):
def validate_length(state):
print validate_length.__name__,state
lenm=len(state.json[state.key].strip())
if lenm
return state._replace(errors = state.errors+["{0} needs at least 3 characters".format(state.key)])._replace(exit=True)
elif lenm>max:
return state._replace(errors = state.errors+["{0} needs at most 10 characters".format(state.key)])._replace(exit=True)
return state
return validate_length
def create_validate_not_in(vals):
def validate_not_in(state):
print validate_not_in.__name__,state
if state.json[state.key].strip().lower() in vals:
return state._replace(errors = state.errors+["{0} has a value which is not allowed".format(state.key)])._replace(exit=True)
return state
return validate_not_in
And now the final validation code:
def validate_simplest_functional_composition(json):
initial_state = ValidationState(json=json, key="measurement",errors=[], exit=False)
composed_function = compose(validate_key_exists, validate_not_null,validate_string_or_unicode,validate_not_empty_string, create_validate_length(3, 10), create_validate_not_in(["archived","password"]))
final_state = composed_function(initial_state)
return final_state.errors
It is much better. It basically says: having an initial start of the system, run all these functions (validators) and at the end get a final state.
Now the entire code is available online and can also be run at:
http://runnable.com/VNMhoTKLSn9Tm0GI/fighting-complexity-through-functional-composition-for-python
http://runnable.com/VNMhoTKLSn9Tm0GI/fighting-complexity-through-functional-composition-for-python
Conclusion: Why is this better?
Now you would think, how can a solution with ... lines of code be better then one with just 21. In part 2 of the article, called "Why is functional composition better" I will illustrate why, and how functional composition makes our code simpler, easier to understand, debug and change.