Agile metrics and measurements

01 February, 2015 by Martin Aspeli | AgileProject ManagementMobile WebWeb

As more and more organisations adopt Agile ways of working, the demand for “objective” measurements to compare teams and gauge efficiency and value for money is increasing. But is this consistent with an Agile mind-set? Or even possible? I was recently asked some questions around these topics in the context of a client project. A slightly edited excerpt of some of the key questions and answers can be found below.

Q: What metrics would we typically see to measure productivity and what would “good” look like? E.g. is there something around the ratio of software engineers in the team compared with other resources, time to produce an epic or a user story etc.?

We certainly have some thoughts on what a “good” team structure might look like in one of our teams. Typically, in each team, we’d have 4-8 developers, a technical lead, a functional lead, and 1-3 further BAs, who would also do some of the quality assurance.

We would then also often have a team of separate testers for things like SIT (integration testing) and NFT (non-functional testing), with the aim being that they are kept at arms length of the development team to provide independent assurance. That is, our clients often engage a third party to do this. However, the primary point at which quality is assured is “in the factory”, by the developers doing things like pair programming, code reviews, automated unit testing and behaviour driven development.

Units of requirements like epics, user stories and story points are arbitrary and will vary from project to project. You can’t compare team on this basis, unless they operate in exactly the same environment with exactly the same input requirements. That almost never happens. However, there are some indicators we can look at:

  • How much WIP (work-in-process, i.e. things started, but not yet fully finished at any one point in time) is in the system? If WIP is out of proportion with the size of the team (e.g. 4 developers are trying to do 10 user stories at the same time), then efficiency suffers significantly for two reasons: 1) task switching causes delays and waste; and 2) end-to-end cycle time of a requirement becomes longer, which delays feedback, which reduces both the efficacy and the efficiency of the development process.
  • What is the change in throughput (“velocity” to a Scrum team) over time? You’d typically see it start low and ramp up. If you plot number of stories completed, cumulatively, week on week, you’d expect an S shape (lower rate of increase at the start, faster towards the middle, and lower again towards the end). A team with no improvement whatsoever is either fully efficient from the beginning (unlikely) or not engaging in any form of continuous improvement activity (a missed opportunity).
  • Failure demand, i.e. (re)work caused by escaped defects or a failure to build the right thing. This causes variation in the flow of work (since developers have to go back to work that is supposedly finished) and makes it difficult to predict true throughput (it’s not actually ‘done’ if it’s defective). You can also measure things like number of defects found at a particular test cycle or average age of defects. Lots of old defects means the team is delivering fake value: they claim to be done, but in truth their work is not fit for purpose until the defects are fixed; this is usually a symptom of developers who think quality is “somebody else’s problem.” The “somebody else” is usually some poor tester, remote from the team.
  • Value delivered. At the end of the day, we build software for a business purpose. Whatever it is, it’s worthwhile figuring out how to measure benefits. A good Agile team will deliver value early, on the principle that some value reaped earlier (and thus for longer) is better than delaying all the value delivery until the mythical “end” of the project. The leaner, faster and more aligned to the business the development team is, the better it will be able to engage in a way where the business can have something built quite quickly, push this out to live, learn how well it worked, and apply that learning to steer the development towards higher value. This requires an effective means of prioritisation and feedback, and – perhaps counter-intuitively – comfort with planning for the short term whilst leaving the long term seemingly less certain (or defined).

Q: Broadly what proportion of effort should be spent on: new functionality/app; improving the underlying code base/solutions; minor changes and 3rd line support?

This is another “how long is a piece of string” question, but:

The effort spent on new features vs. improving existing functionality should be proportional to the value derived from each. Only the business can ascertain that value, so it’s important to have an appropriate control system to allow this to be done.

The concept of “cost of delay” can be a useful way to express this. It may be difficult to calculate in exact monetary terms, but we can look for patterns. So, if we have a taxonomy where: some items have a slowly rising cost of delay (“every day we don’t have this feature, we forego a little revenue/saving, so having it earlier is good”); some have a fast rising one (“this is hurting us right now and will continue to hurt us until we have this done”); some have a future-dated vertical cost of delay (“unless we meet this regulatory requirement by its due date, we are out of business, so the cost is ‘infinite’”); and some have more of a logarithmic scale (“it’s not a big deal now, but at some point it will hurt us a lot”); then we can try to use this to prioritise our work.

The kicker is that “investment” items (e.g. improving your underlying infrastructure) are usually of the logarithmic type, and so a naïve business will never prioritise them until they have become a crisis (“we should’ve upgraded our platform four years ago, but now it’s out of support and there’s a critical vulnerability, so we need to drop everything and fix it”). This causes variation in our flow as team members lurch from one crisis to another, and variation is the enemy of efficiency. Hence, we often try to get teams reserve a small proportion of their capacity (say, 10%) for this type of “investment” work. This also serves to create some slack (i.e. these things are easier to sacrifice than features we have promised to deliver), in case of genuinely “urgent” and unforeseen request. Clearly, you need to keep an eye on this so that you don’t just end up cutting the investment items all the time.

It’s a similar story with minor changes and third line support: some type of capacity reservation is usually prudent, and if there aren’t any incidents, then that capacity can be used for “easily interruptible” tasks like improving documentation or personal learning that still deliver some value. However, if this requires a lot of capacity, then that speaks to the underlying quality of the solution and it’s important to learn from that: applying effort late (after go-live!) to fix problems is usually a lot less efficient than applying that effort in the main flow of delivery. Usually, this is a sign that the team have been asked to do too much in the original deadline and should have cut scope, rather than extend the deadline or grow bigger or work harder!

Q: What organisational design models have we seen before? E.g. grouped by skill set (testers, software engineers, UX etc) or by solution component

Scrum™ is very adamant that you should have cross-functional teams with significant staff liquidity. Other methods are more agnostic. However, there is a lot of waste in hand-offs between different functions (both because of the communication overhead and because it causes jagged flow of work and queues to build up all over the place due to more or less independent resource management and tracking), so the more we can let a single team (or even person) take responsibility for delivering a feature end to end, the less waste we will have, all other things being equal.

In most of our teams, we consider the “factory” (i.e. what we have control over) to be a single unit. There is some specialisation within that (manager vs. engineers vs. BAs), but largely the ownership of turning an idea into a requirement into code into tested code sits with the whole team, and there are few formal hand-offs. There is then usually some independent verification (typically client-owned) performing SIT/UAT/NFT, but that is the second time everything is tested and the number of defects that escape into their environments should be low.

In front-end focused projects, we need to think carefully about how best to align creative designers, UX designers and engineers, but the short of it seems to be: pair them up. So, it’s much better if an engineer is able to sit next to a designer and experiment with screen layout in real time, than asking the designer to draw something, wait, hand it over, wait, build something, wait, go back with queries, wait… you get the idea...

This obviously has implications on resourcing. If you have shared services for things like design and test, then you have to accept some queues and some delays (but it may be more efficient or cost effective at a departmental or organisational level). In this case, you would treat the shared resource as a supplier and the teams needing their services as customers, with explicit policies about priority and (informal) SLAs around turnaround time, to allow the “customers” to plan their work in accordance.