Saturday, 21 June 2014

Software Engineering - science or marketing?

The software development metholodogy, the essential of the man who wants to organise a band of ragtag programmers into a team of professional engineers who produce a working product that is delivered on time and meets the requirements of the project. The latter statement is always  point of contention as what was a requirement is no longer a requirement, and what wasn't a requirement is now completely critical to the success of the project. The common adage is that there are two things that you can guarantee about software of any reasonable complexity: That it will be late and contain bugs. Modern software development methodologies attempt to tackle this, and so I would like to understand exactly how well they do this - if at all. We will talk about software in the context of experimental science rather than of mathematical science, because that is by and large how the field of software engineering is practiced today outside of some highly specialised fields.

Let's talk about something else for a second. Let's talk about medicine. Medicine is the great bastion of empirical study in the 20th century. The use of experimental science in medicine has gone a long way in terms of treating disease. If you go to a hospital, and your disease is relatively well understood they will tell you how they will treat it and roughly how long it will take for that treatment to take effect or in the case of surgery, how long you're likely to take to recover. To anybody in software, let's be honest, that's magic. The refinement of medicine as a field in this regard is entirely down to the state of research in the field. I'd highly recommend the book Bad Science if you're at all interested in finding out the consequences of ignoring research in a medical setting and leave it as an exercise to the reader to consider how that might be analogous to the same situation for developers. 

So naturally this is a good time to look at the state of research in software engineering. Let's look at the high level. The replication study is an important part of experimental research - it is the thing that says "yes, somebody who didn't think of the original hypothesis is capable of getting those results too." It verifies the fact that the results weren't just a fluke, the result of selection bias or some similar phenomenon. It has been shown that while the state of replication studies in software engineering is improving "but the absolute number of replications is still small," (da Silva, 2014). In a lot of regards, this conclusion makes sense. Replication studies take time to complete and software engineering is still a relatively new field. However, it's worth noting that this highlights the immaturity of the field. 

Let's look at a more typical piece of research that tries to yield some meaningful conclusion about a particular software development methodology. This particular study, entitled Realizing quality improvement through test driven development was conducted by Microsoft Research and published in the journal Empricial Software Engineering. The conclusions drawn by the article are stark and seem to be highly in favour of the Test Driven Development methodology, it reduced defect rates between 40% and 90% in the projects studied, while only increasing development time by 15-35%. Sounds like a good trade-off right?

The paper was based on four case studies. This should ring some alarm bells - this figure isn't statistically significant. Case studies are not worthless, but are not particularly useful for drawing generalised conclusions that imply your project will act similarly. Not to mis-represent the study, the researchers acknowledge this also, stating that "a family of case studies is likely not to yield statistically significant results, though Harrison (2004) observes that statistical significance has not been shown to impact industrial practice." and later go onto say that they hope that the case studies contribute to the pool of empirical software engineering research; that we will hopefully be able to draw conclusions from such studies being conducted in different contexts. The (cited!) statement about statistical validity not affecting industrial practice is an interesting one, though. Indeed the article they cite there is entitled "Propaganda and software development".  Going into that article, it talks a lot about how peddlers of methodologies try to get people to "jump on the bandwagon" and use emotive language rather than reason to try to get people to use a methodology.

So let's briefly talk about Agile. Agile seems to be the poster child of modern development methodology in industry. It represents the antithesis of everything that the waterfall model stood for. It is definitely sold on reason and language more than it is sold on evidence (in the sense of research based evidence that I am talking about in this post). If you look at the Agile Manifesto, what I am saying is highly self evident. It's marketed more than it is argued for. I'm not necessarily saying that Agile or its child methodologies is the wrong way to go, and this post isn't talking about the specifics of the Agile methodology but rather what we really know about it.  Somewhat more sinister is Scrum, an Agile based development framework, which (successfully) sells people training courses. It makes some vague notions about following empiricism on its website, although at a cursory glance does not seem to link to any articles to that effect although does point at a few sciencey-looking graphs (with no citations of which to speak). Again, this is marketing above research.

But let's take off the tinfoil hat for a moment and go back to talking about research. So if we have some research, but realistically we cannot study large samples in one go, what can we do about that to start to draw generalised conclusions? That's right! The meta-analysis (this is the research tool which is used to aggregate many individual studies - so you can take many small studies and treat them as one large one, placing weight on the studies which have been conducted most properly). It turns out that there are some meta-analyses  on the subject of software engineering, which is good. I managed to find a meta analysis on the subject of Test Driven Development, which makes a nice comparison to the Microsoft study above. The conclusion that they came to were that the effects of TDD were relatively small, although larger if you only look at industrial studies rather than academic ones (Rafique, 2013).

What I'm not really trying to do here is to draw any conclusions on what we should or should not be doing in regards to any particular software engineering technique. I am rather trying to understand whether software engineering, as an industry, really knows what it's doing. I suspect that the answer is really "sort of" - in the sense that there are some subjects which have been reasonably well studied, but we're doing a lot of it because management thought it was a good idea or the marketing material for it seemed quite good. Similar problems in terms of research in software engineering are seen in social sciences, because it's very hard to reliably study groups of people. The other spin I had on this post was that I might have titled it "Software engineers are not sociologists, but should they be?" in that same vein. I kind of imagine that all such subjects are kind of like medicine was pre-germ theory, which is incredibly exciting when you consider what might be around the corner when we discover the 'next big thing' in terms of software engineering methodology - although worrying if you consider the analogy of what we're doing at the moment.

References to non-hyper-linked papers (can't link to the full text because I've accessed them through my university and distributing them would likely be naughty):

Silva, Fabio ; Suassuna, Marcos ; Fran├ža, A. ; Grubb, Alicia ; Gouveia, Tatiana ; Monteiro, Cleviton ; Santos, Igor

Empirical Software Engineering, 2014, Vol.19(3), pp.501-55

Yahya Rafique and Vojislav B. Misic, IEEE transactions on software engineering, 2013

No comments:

Post a Comment