Thursday, March 25, 2010

Code Quality bridge collapse

Henry Petroski popularized an interesting theory that there is a spectacular bridge collapse every 30 years. There are some very convincing datapoints behind this observation: The Tay Bridge in 1879, the Quebec bridge in 1907, the Tacoma Narrows bridge in 1940 etc. The reasoning is that engineers tend to get over confident over time and start stretching the boundaries of a new technology and start making unreasonable trade-offs between cost and safety until it gives way at some point. In the case of civil engineering, the results of the compromises made come back to bite much later.

The general principle seems to apply to Software Engineering as well, only the collapse will come much earlier than 30 years! Take the following classic example from a simple process improvement initiative:

Take a team with a partially effective manual code review process that is starting to leverage automated tools for static analysis. Initial results will be very positive. The automated process/tool will improve coverage and will be used as a mechanism to optimize the manual reviews. Rather than selecting code randomly, manual reviews will be done on the code blocks that show the most amount of complexity, duplications, best practice violations etc. The result will be more effective code reviews leading to improved code quality.

Slowly but surely over-confidence will set in as there are no observed code quality failures over a period of time. Manual code reviews themselves will start getting compromised as long as the automated metrics are within "reasonable" range - effectively compromising quality to reduce the cost. The ingredients are thus in place for a code quality bridge collapse.

The worst case scenario is that you blame such failures on the automated process or tool and go back to square one. The ideal solution is to make sure you combine the best of automated and manual processes to get the right balance of quality and cost.

Yet another important lesson for Software Engineers from the classic civil engineering world.

Tuesday, February 9, 2010

Losing the plot on SOA?

Many product development engagements seem to be losing the plot on SOA - concepts of Service Oriented Architecture abused to create systems that are unnecessarily complex and cumbersome to maintain, not to mention perform badly. The major issue is about service granularity and the common mistake of equating "logical services" in an SOA context with "web services".

Important to be first clear about what I mean by a "logical service". A "logical service" is a "set of functionality" that needs to be managed or may need to be provisioned autonomously. By this definition, a "logical service" will most often equate to a "system", since a system typically is an autonomous unit with a cohesive development/release cycle, persistence store etc. But the important exception to this is in relation to expected integration points. If there is a set of functionality in your system which you feel needs to be optionally supported through some third party integration - that is a good separation point, where a single system should be separated into multiple logical services even though they currently (in the default state) access the same system. All linkages between logical services should ONLY be via the service interfaces and any linkages and joins (also called service orchestration logic) between these logical systems should reside above the logical services. Whether or not you need a full-fledged BPEL orchestration layer for having this orchestration logic can be based on the complexity of the linkages between your logical services and the flexibility required to change that orchestration logic. If you’re designing a new system and want to leverage SOA principles to improve the extensibility and integratability of your application, then this may even not be necessary. If you’re trying to orchestrate end-to-end business processes across diverse logical services, or you're designing a massive ERP with the requirement to sell different pieces separately (with support for integration with other systems), then this makes a lot of sense.

So what's the relationship between a “logical service” and a “web service”? A logical Service in an SOA context could be organized as a collection of web services to make it easier to manage and access. This does not mean that any dependencies between the functionality of those web services should be done only by making web-service calls! What a total waste that would be?? As long as these web services form a single "logical service", their implementations should be free to use a common business logic layer as well as even a common persistence store, making it possible to write web services that are performant that use the full power of good old SQL etc. This will also ensure that you expose as web services only those services that are crossing the logical service boundary - making the "logical service" itself course grained and light weight from a contract perspective. When trying to carve out logical services, think at the level of current or expected system integrations.

Here are some of the indicators that you have lost the plot on SOA:

4) A lot of your business logic is starting to appear in the orchestration layer or your orchestration layer seems to be getting bulkier.
3) You're confused as to why you need a rules engine or a workflow engine when you have BPEL. 2) You are asking yourself why "WS-Transactions" is not better supported in major SOA frameworks or writing a lot of compensating transactions to calls that are eventually going into the same system.
1) You are pulling your hair out trying to essentially do what looks like "joins" in your services/orchestration layer, between services that are underneath going to the same database!

GraphQL: what REST should have been?

Since the dawn of distributed computing, we've gone through many different mechanisms and standards for programmatically invoking remote...