Let's talk about something else for a second. Let's talk about medicine. Medicine is the great bastion of empirical study in the 20th century. The use of experimental science in medicine has gone a long way in terms of treating disease. If you go to a hospital, and your disease is relatively well understood they will tell you how they will treat it and roughly how long it will take for that treatment to take effect or in the case of surgery, how long you're likely to take to recover. To anybody in software, let's be honest, that's magic. The refinement of medicine as a field in this regard is entirely down to the state of research in the field. I'd highly recommend the book Bad Science if you're at all interested in finding out the consequences of ignoring research in a medical setting and leave it as an exercise to the reader to consider how that might be analogous to the same situation for developers.
So naturally this is a good time to look at the state of research in software engineering. Let's look at the high level. The replication study is an important part of experimental research - it is the thing that says "yes, somebody who didn't think of the original hypothesis is capable of getting those results too." It verifies the fact that the results weren't just a fluke, the result of selection bias or some similar phenomenon. It has been shown that while the state of replication studies in software engineering is improving "but the absolute number of replications is still small," (da Silva, 2014). In a lot of regards, this conclusion makes sense. Replication studies take time to complete and software engineering is still a relatively new field. However, it's worth noting that this highlights the immaturity of the field.
Let's look at a more typical piece of research that tries to yield some meaningful conclusion about a particular software development methodology. This particular study, entitled Realizing quality improvement through test driven development was conducted by Microsoft Research and published in the journal Empricial Software Engineering. The conclusions drawn by the article are stark and seem to be highly in favour of the Test Driven Development methodology, it reduced defect rates between 40% and 90% in the projects studied, while only increasing development time by 15-35%. Sounds like a good trade-off right?
So let's briefly talk about Agile. Agile seems to be the poster child of modern development methodology in industry. It represents the antithesis of everything that the waterfall model stood for. It is definitely sold on reason and language more than it is sold on evidence (in the sense of research based evidence that I am talking about in this post). If you look at the Agile Manifesto, what I am saying is highly self evident. It's marketed more than it is argued for. I'm not necessarily saying that Agile or its child methodologies is the wrong way to go, and this post isn't talking about the specifics of the Agile methodology but rather what we really know about it. Somewhat more sinister is Scrum, an Agile based development framework, which (successfully) sells people training courses. It makes some vague notions about following empiricism on its website, although at a cursory glance does not seem to link to any articles to that effect although does point at a few sciencey-looking graphs (with no citations of which to speak). Again, this is marketing above research.
But let's take off the tinfoil hat for a moment and go back to talking about research. So if we have some research, but realistically we cannot study large samples in one go, what can we do about that to start to draw generalised conclusions? That's right! The meta-analysis (this is the research tool which is used to aggregate many individual studies - so you can take many small studies and treat them as one large one, placing weight on the studies which have been conducted most properly). It turns out that there are some meta-analyses on the subject of software engineering, which is good. I managed to find a meta analysis on the subject of Test Driven Development, which makes a nice comparison to the Microsoft study above. The conclusion that they came to were that the effects of TDD were relatively small, although larger if you only look at industrial studies rather than academic ones (Rafique, 2013).
What I'm not really trying to do here is to draw any conclusions on what we should or should not be doing in regards to any particular software engineering technique. I am rather trying to understand whether software engineering, as an industry, really knows what it's doing. I suspect that the answer is really "sort of" - in the sense that there are some subjects which have been reasonably well studied, but we're doing a lot of it because management thought it was a good idea or the marketing material for it seemed quite good. Similar problems in terms of research in software engineering are seen in social sciences, because it's very hard to reliably study groups of people. The other spin I had on this post was that I might have titled it "Software engineers are not sociologists, but should they be?" in that same vein. I kind of imagine that all such subjects are kind of like medicine was pre-germ theory, which is incredibly exciting when you consider what might be around the corner when we discover the 'next big thing' in terms of software engineering methodology - although worrying if you consider the analogy of what we're doing at the moment.
References to non-hyper-linked papers (can't link to the full text because I've accessed them through my university and distributing them would likely be naughty):