The ‘scientific method’ – or is it?
I like the work of David Anderson on Kanban, which I think is deserving of the attention it is now getting in the software development industry. His latest post describes the basic principles well. I noticed he and others in the field describe the use of the ‘scientific model’ as approach to improve one’s process. In this post I would like to examine the use of this term, because I think there’s a lot of misunderstanding about scientific methods and how they can be applied to software projects.
In David’s post it is described as follows:
The use of models allows a team to make a prediction about the affect of a change (or intervention). After the change is implemented the outcome can be observed by measuring the flow and examining the data. The outcome can be compared to the prediction expected from the model and the change can be assessed as an improvement, or not. This process of evaluating an empirical observation with a model, suggesting an intervention and predicting the outcome based on the model, then observing what really happens and comparing with the prediction, is use of the scientific method in its fundamental sense. This scientific approach, is I believe, more likely to lead to learning at both the individual and organizational level. Hence the use of the scientific approach in Kanban will lead directly to the emergence of learning organizations.
It makes a lot of sense, change a variable and then measure the difference it makes. I do believe however that some scientific methods are overlooked in this type of description. I’ll try to explain.
Suppose you’ve run a project and you have gathered all kinds of metrics regarding performance of your team. The data shows that since this one bald developer was introduced in an otherwise hairy developer team the velocity increased. This leads you to the hypothesis that having at least one bald developer on the team leads to better team performance. In order to test this hypothesis, you will introduce a bald programmer to another team, and again you collect empirical evidence to see if your hypothesis is confirmed. Again, the numbers show an increase of productivity. You’ve followed ‘the scientific method’, so you can now state firmly that
bald programmers lead to better team productivity.
And you have numbers to back it up.
The example is deliberately silly, you will immediately think of alternative explanations. Quite often, bald developers are of advanced age and therefore on average more senior and experienced developers. There’s all sorts of alternative explanations that would, when true, show the correlation between performance and amount of hair to be spurious. That is one problem that is not easily solved, there can always be an underlying correlation that is the true predictor for the result.
However, the outcome can also be influenced by other factors, and these can be controlled by using methods that are ‘more scientific’.
Let’s talk about two types of controls that are common in social scientific research: the use of control groups and sample distribution.
In the social sciences, experiments to gather data to test a hypothesis are executed in controlled experiments. Usually, to test a hypothesis a researcher use two different (sets of) subjects simultaneous: a treatment group, for which the effect of an manipulation is observed, and a control group, which uses all of the same conditions as the first with the exception of the actual manipulation. The effect of the manipulation can then be measured by comparing the results of both groups. If you do not use a control group, the data that you so carefully collect might be influenced by factors outside of your manipulation. Increasing performance might for example be in fact due to the fact that the team was working on easier parts of the system for that time, or the bout of flue that was going around and kept important team members from contributing was over by that time. Using a control group will eliminate all factors related to the time of your experiment from influencing the outcome of your manipulation. Well established misinterpretations such as the Hawthorne effect can thus be avoided.
Secondly, you’re looking for findings that are applicable not just for the team in question but somewhat broader. But can you say that findings for the one team are expected to apply to all teams in your organization? To do so you need a sample (the team) that is representative for the population (all teams) that you want to target.
In science the sample size and distribution of these characteristics are used to measure the probability that the effect of manipulation is in fact valid for the population. If you want to measure the popularity of Ajax football club in the whole of Holland for example, you would be wise to include people from different age, sex, from different parts of the country, with different incomes and so on. Obviously, the closer the sample size is to the population, the higher the probability that the findings will apply to the population as a whole. We know teams can vary greatly. If you add the bald developer to a team of java developers, the effect might be totally different then adding him to a team of php developers. Adding a baldy to a new team might increase productivity, adding him to a well-established team however could be counter-productive.
There’s all sorts of team attributes that might be important to the effect of manipulation. One should look carefully at the attributes of a team in order to decide if the sample is representative.
I don’t think there is such a thing as one ‘scientific method’. There’s different methods, one more scientific than the other, that can lead to a measurement of probability that a hypothesis is true or false. And sure, in practice it will not always be feasible to improve our process in a rigorous scientific manner. I do think for best results we should at least understand and communicate these factors and account for them where possible.