Thursday, March 25, 2010

Code Quality bridge collapse

Henry Petroski popularized an interesting theory that there is a spectacular bridge collapse every 30 years. There are some very convincing datapoints behind this observation: The Tay Bridge in 1879, the Quebec bridge in 1907, the Tacoma Narrows bridge in 1940 etc. The reasoning is that engineers tend to get over confident over time and start stretching the boundaries of a new technology and start making unreasonable trade-offs between cost and safety until it gives way at some point. In the case of civil engineering, the results of the compromises made come back to bite much later.

The general principle seems to apply to Software Engineering as well, only the collapse will come much earlier than 30 years! Take the following classic example from a simple process improvement initiative:

Take a team with a partially effective manual code review process that is starting to leverage automated tools for static analysis. Initial results will be very positive. The automated process/tool will improve coverage and will be used as a mechanism to optimize the manual reviews. Rather than selecting code randomly, manual reviews will be done on the code blocks that show the most amount of complexity, duplications, best practice violations etc. The result will be more effective code reviews leading to improved code quality.

Slowly but surely over-confidence will set in as there are no observed code quality failures over a period of time. Manual code reviews themselves will start getting compromised as long as the automated metrics are within "reasonable" range - effectively compromising quality to reduce the cost. The ingredients are thus in place for a code quality bridge collapse.

The worst case scenario is that you blame such failures on the automated process or tool and go back to square one. The ideal solution is to make sure you combine the best of automated and manual processes to get the right balance of quality and cost.

Yet another important lesson for Software Engineers from the classic civil engineering world.

GraphQL: what REST should have been?

Since the dawn of distributed computing, we've gone through many different mechanisms and standards for programmatically invoking remote...