No Silver Bullet

26 January, 2015 by Martin Aspeli | AgileProject ManagementQuality Assurance

Test automation is a key technical practice of modern software development, yet it is sometimes viewed as an unnecessary luxury in enterprise software development. In this article, Martin Aspeli looks at the reasons why.

In 1975, Fred Brooks wrote The Mythical Man Month, arguably the only book you ever need to read about software development. In his essay “No Silver Bullet”, he wrote: “Software entities are more complex for their size than perhaps any other human construct.” Even a moderately complex piece of software could be engineered in an almost infinite number of different ways, and no two developers will produce identical outputs for a non-trivial set of requirements. The upshot of all this: we are human and we make mistakes. Our software contains defects and bugs.

To remove (or at least reduce) the bugs, we rely on testing. Testing takes two main forms:

  • 1. Validate that the software performs the functions expected by its users. This is what we refer to as external quality – have we built the right thing?
  • 2. Checking the correctness of implementation of each component of a piece of software. For each possible input, confirm the expected output. We call internal quality – have we built it right?

The first type of testing is hugely important and cannot be automated. It should be done all the time, with as representative a group of users as possible.

It is the second type of testing that should be automated as much as possible, for a number of reasons:

  • It is a labour-intensive process that increases in effort exponentially as the complexity of the software under test grows.
  • It is highly repetitive – even boring. Consider a simple function that divides one number by another. To test that this performs as expected, we may want to test conditions where the numerator is negative, positive, zero, or a very large number; ditto the denominator, leading to 24 possible conditions that could be tested. If manual test execution means clicking through a user interface to set up a condition where the code is executed to produce each of those, it could take many minutes. Compound that to the thousands of operations performed in even moderately complex software, and you quickly run out of time.
  • If we want to be Agile – to be responsive to genuinely business-driven change; to allow architecture and technical foundations to evolve as we learn more about a project; to catch problems as early as possible to minimise the cost of remediation – we will want to test not just once at the end of a project, but every week or even every day. Lengthy test execution is profoundly limiting to agility.
  • Lack of appropriate, timely testing inevitably leads to regressions. Regressions seriously undermine business confidence in a software team and the ability to predict project completion. We find seemingly fundamental bugs, and we are terrified of what else may be broken.
  • A complex, yet mundane process makes it likely that human error will occur in test execution, reducing the value of testing.

Test automation is actually a very simple idea: we take this labour-intensive, repetitive and largely predictable activity and we let a computer do as much of it as possible, letting the humans focus on the value-adding activities with greater confidence that the underlying solution is sound.

There are two main types of automated tests: unit tests and acceptance tests.

Unit tests

Automated unit tests assure internal quality. They exercise a single function or unit of a program. A very simple harness calls the unit under test with a range of inputs and expected outputs. Few or no additional components are involved, to make it as easy as possible to trace a failure to its root cause.

 

Writing the test and the code it is testing goes hand-in hand, and should be done by the same person. To borrow an analogy from “Uncle” Bob Martin: automated unit tests are akin to double-entry bookkeeping. Just like you would never trust an accountant to make zero mistakes in a complex set of accounts without the confidence provided by dual entry, you should never trust a programmer to write bug-free code based only on his or her own reading of that code.

A unit test is a very simple piece of code (“dumb tests are good tests”, as I like to say) that describes the expected behaviour of a component for a single set of inputs. By describing the required logic in two different ways – one elegant, performant and featureful for production use, the other naïve, repetitive and easy to understand for testing purposes – we radically improve the chances that the code that ends up in production is actually correct.

Unit tests also serve to inspire confidence. If we believe that the individual units of our code are highly likely to be correct, then we can more easily alter the way we assemble those units into larger components and systems. We then test those integrations, again using automation as far as possible, but here out of necessity avoiding the full combinatorial complexity of possible inputs and outputs. This is akin to having the confidence that every component that goes into a car has been individually quality assured before the car is assembled.
Finally, unit tests can be viewed as executable developer documentation. The test suite explains, unambiguously, how the programmer intended a function to work. If in future we attempt to change the innards of that function, say to fix an escaped defect or improve performance, we can run the test suite again and have confidence that we did not accidentally break an edge case. Good programmers write specific tests for intentional conditions that may not be obvious in the future, for this very reason.

There are three main ways to write unit tests:

  • 1. Test-last: Write some code, then write some tests. This is how most inexperienced unit testers begin. The problem is that it is easy to lose concentration or succumb to procrastination and not write as many or as useful tests as we would like.
  • 2. Test-during: Write the code and tests at the same time, and use the test suite as the starting point for debugging and detailed code analysis. This is how I code most of the time.
  • 3. Test-first: Write tests that describe the thing you intend to build. Verify that they fail – you haven’t built it yet! Then write the unit under test, running the tests repeatedly until they pass. This is the most disciplined, rigorous approach to unit testing and is known by its acronym TDD – Test Driven Development. Some people find it difficult to get into this habit, but those who do it well write some of the best code you will ever find.

Acceptance tests

Automated acceptance tests assure external quality: are we building the right thing? The practice of Acceptance Test Driven Development (ATDD, sometimes called Behaviour Driven Development, or BDD) is based on the idea of writing detailed requirements in the form of acceptance tests. The business warrants that when those tests pass, the associated feature can be deemed complete.

In our projects, we typically write the acceptance test scenarios “just-in-time” as a medium-level requirement (a user story) is about to scheduled for development. We then ask the client to sign off these acceptance criteria, to remove any ambiguity about what we are building. We have found that this not only increases our shared understanding of requirements, it also leads to radically better requirements and saves us from having to define separate, possibly inconsistent test cases later.

It is not strictly necessary to automate acceptance tests, but it is a very good idea. We created a tool called CoreJet specifically to support our workflow of writing, signing off, automating and reporting against these acceptance tests. It presents a two-dimensional visualisation of the project that shows the amount of scope implemented and covered by passing automated acceptance tests. The more “green”, the closer we are to completion.

Automated acceptance tests do not replace unit tests, and typically test key business scenarios, but not every combination of inputs and outputs. They often involve scripting a web browser or other user interface, and can be significantly slower to run than lightweight unit tests. The two approaches are very much complementary.

Acceptance test automation serves another purpose as well: they make sure developers actually read the specification. Using a tool like CoreJet, the exact text of the most detailed expression of a requirement is right there in the developer’s line of sight, woven into the code of a test suite. There is no separate document to read and understand and no context switching. We are more likely to build the right thing the first time around.

Continuous integration

Test automation underpins another key agile practice: continuous integration. It works like this: a server monitors the source code repository for changes. As soon as something is changed, the full suite of automated tests is run. If they all pass: great! If not, a warning sign goes on (quite literally, in our office, where we’ve rigged a red light to the CI server), and the exact change that caused a test to regress is highlighted. The developer responsible is made to fix it, buy coffee for the team and optionally wear a funny hat.

Continuous integration is all about fail fast. The cost of fixing a defect incurred minutes ago is miniscule compared to the cost of fixing it six months down the line after thousands of code changes. We all make mistakes, but we can redeem ourselves quickly by fixing them immediately, whilst the errant code is still fresh in our minds.

Good testing discipline

Mature development teams adopt a great deal of discipline around their automated testing:

  • Monitor and report on test coverage (there are tools that can report on the percentage of non-test code exercised during a test run). Somewhere north of 75% is considered good. However, this is merely one indicator. It is possible to have high coverage with poor tests and vice-a-versa, so be pragmatic.
  • Review the coverage and correctness of tests as part of manual code reviews.
  • Always run the tests locally before checking in new code to detect regressions. Never “break the build” for others on the team.
  • If you do break the build, don’t go home without fixing it.

For the best developers I know, great automated testing is an obsession, and a source of genuine pride.

But…

Alas, not everyone is sold. Below are some of the excuses we’ve heard for not doing testing, in increasing order of validity.

We have a team of manual testers. They do the testing for us.

As outlined, intelligent quality assurance by humans is hugely important. Manual and automated testing are complementary. However, the “somebody else’s problem” mentality simply doesn’t fly. Each developer should feel accountable for the code they write being correct. Writing something half-baked and waiting for someone else to find problems is a recipe for messy code and long delays between incurring a defect and detecting it. This leads to significantly higher overall costs.

We’re on a deadline. We don’t have time to write tests.

Fear of failure can sometimes create perverse incentives. We think it is better to meet a deadline with something that doesn’t work, than to miss it by a week, but have something that does.

The answer to this is simple: do less, to a consistently high level of quality. A looming deadline that looks unachievable is usually a systemic failure to provide the environment in which the team has enough capacity and realistic goals. Negotiate scope down and don’t attempt to start features that won’t be complete in time. Having regular releases (i.e. short iterations in a Scrum team) and making sure you work on high priority features first makes this significantly easier.

We charge by the hour. The client won’t pay for tests.

Good test automation discipline has the potential to significantly reduce overall effort expended on a software development project.
The constraining factor of a developer’s productivity is certainly not the rate at which he or she can type out code. Measures based on lines of code are largely useless, but even quite productive developers may not produce more than, say, 100 lines of non-generated code, net, in a day (the best developers tend to remove a lot of code by replacing it with more elegant constructs). Typing that out flat would easily take less than half an hour.

The challenge, of course, is producing the right 100 lines of code. Good automated unit tests tend to lead to cleaner, better-factored code, because well-factored code is easier to test. The act of writing a suite of tests tend to help the developer explore and understand how a piece of logic should behave to be correct in a way that simply staring at the screen does not. Simply put, test automation accelerates the problem-solving thought process that is software development.

Finally, there is “death by a thousand cuts” impact to productivity of manual testing. Writing a simple component may take a few minutes. To check that it works, you start up an application server and click through the user interface to exercise the code. This may take another minute or two. You find a problem. Go back and change the code, restart the server, click through the user interface again. Repeat. The first couple of times, the time to write a decent automated unit test may be longer than the time to click through the user interface. Once you have a test, though, creating variants of it to test boundary conditions is normally just copy-and-paste with minor edits. The test should run in less than a second. Economies of scale kick in.

Now consider a situation where you want to improve or refactor a fundamental, shared component, with potentially open-ended consequences. A fully manual regression test of an application to have confidence that the change hasn’t broken something could take hours. It is simply unsustainable.

I’m a great developer. I don’t need to test my code.

The best developers in the world are acutely aware of the inevitability that they will introduce defects. That is probably why they are so passionate about test automation. That said, if you know of a programmer who writes perfect code, tell them we want to offer them a job.

The tests take too long to run.

As a project grows, it is not uncommon to have thousands of unit tests and hundreds of acceptance tests, if not more. Sometimes these can take a long time to run. It is prudent to monitor test execution times. If it takes longer than a few minutes to run automated unit tests and longer than ten minutes to run automated acceptance tests, it is worth investing in ways to speed up the tests. Alternatively, it may be necessary to take a risk-based approach and partition the tests so that some tests (i.e. those at greater risk of failing) are run more frequently than others. At the very least, however, you should run all the tests once per day.

We don’t know how to write test.

Test automation is a skill. Just as a developer needs to learn the prevailing languages and frameworks, he or she should learn the test tools used on a project. For people who often work with cutting-edge technology, programmers can be surprisingly adverse to learning new things. Have none of it, but invest in training and mentoring where required.

Our solution doesn’t support test automation.

This is probably the only valid reason in this whole list. Some software, especially older software, can be difficult to automate, for example due to excessive dependencies between modules or poor abstraction. It may take significant investment to make custom code testable.

However, there are libraries and tools that can reduce this burden. Mocking frameworks can be used to stub out interfaces, even deep into a third party component. Tools like Selenium can be used to automate at the user interface level for web applications. It may not be possible to test perfectly, but even a small number of tests are infinitely better than none.

One might also suggest that ability to support test automation should be a key deciding factor when choosing technologies to adopt.

What shouldn’t we test?

Test automation is not simply a case of “more is more”. There are some things that don’t make sense to test. For example:

  • The layout of a user interface. This is much better judged by a human being than a computer.
  • Tautologies, e.g. checking that a constant equals its value.
  • Excessively repeated tests, e.g. checking that a shared component behaves the same in every single instance where it is used. Instead, test the component as a single unit, and then test the other units that rely on it with the assumption that the shared component works.
  • Externalised configuration that could change between environments.
  • Automatically generated code, e.g. “getters” and “setters” in Java.
  • Things unrelated to your code. It is sometimes easy to get a little carried away and test, for example, that a database knows how to execute a simple query, or that a web browser can render HTML. It’s probably safe to assume those things work.
  • Conditions that depend on specific data in an external system that cannot be controlled as part of the test fixture. The problem with this is that the test may not be easily repeatable across environments.
  • Conditions that depend on dates. I call these New Year’s Regressions, as there is usually some poorly written, date-dependent tests that fail once the system clock ticks over into another year.

Conclusion

I’m going to end this article with a bold statement: Untested code is broken code. We simply should not accept software solutions that do not have a decent number of appropriate, passing automated tests. It is one of the first things I look for when I review a software project, and one of the best indicators of project quality.