Nicholas, the Scrum Master, has observed that every sprint seems to end with a big rush to finish the stories that were committed to it. The team usually manages to finish the work they planned, but he has heard frequent complaints from the testers, who mentioned that they often find themselves ready to start testing stories that they know the programmers finished implementing, but those stories are not yet available in the test environment, and they need to wait. When the stories eventually arrive, they come all at the same time, and they all have to scramble and rush to get them tested before the end of the sprint. This situation not only creates overburdening towards the end of the sprint, but also creates dissatisfaction around quality, since team members feel testing can't be as thorough as they would like, and fixes (when required) are also rushed or need to be deferred to the next sprint.
Nicholas looks at the team tracking board and notices that, with only 4 days left in the sprint, most of the work seems to be in progress at the Development stage.
According to the testers, some of that work has been finished but can't move into testing yet. It's not immediately obvious what's holding it in place, however.
Nicholas talks to the programmers, who explain that, indeed, 4 of the stories are finished with implementation, but before they can integrate the code with the build that gets deployed into the test environment, the changes need to go through a code review with Aloisa, the Technical Lead, or Naresh, the most senior programmer in the team. Both Aloisa and Naresh are usually very busy: not only do they have their own stories to work on, but additionally they often attend meetings with other people in the organization.
This starts painting a picture: once implemented, stories spend time "sitting" in the Development column, waiting for a senior team member to conduct a code review. Nicholas has essentially uncovered that the cause of the delay between Dev and Test the team is experiencing is the result of a "Non-Instant Availability" (NIA) problem.
NIA occurs when work has a dependency on some specialized "resource" (in the broadest sense of the term) that is not available when the work that requires it arrives. It's, in a sense, a form of bottleneck, but in this case what creates the constraint is not necessarily capacity (although it could also suffer from not enough capacity) but availability to do the work.
What do we do?
Going back to Nicholas' story, the picture is still incomplete, and it doesn't explain why things are done this way, but he decides that a good first step could be to make visible the work queuing up waiting for code reviews:
This will not solve the problem but, as with all sources of delay, making the queue visible can help raise awareness about it, and hopefully spark some #letsDoSomethingAboutIt conversations.
It's important to also recognize that most delays are the result of some policy established by the organization (implicitly or explicitly). Therefore, delays can be eliminated by changing policy, or establishing new ones.
In general terms, the solution for NIA problems is, of course, to increase availability. That may be difficult or the way to do so not immediately apparent, however, so something to consider is that NIA is very similar to a bottleneck, so we can take inspiration in strategies to deal with those kinds of problems, and in particular, the ideas coming from Theory of Constraints (ToC).
ToC advises us to consider 3 types of strategies to deal with bottlenecks, to be applied roughly in order: protection, subordination, and elevation.
Protection and Subordination
Protection strategies refer to actions intended to protect the constraint from "distractions" so that it can operate at peak performance and its limited capacity is utilized at it fullest. Since the bottleneck is the part of the process that has the least capacity, the last thing we want is to allow things that will cause it to have even less capacity available, so a protection strategy would be geared towards removing those "distractions".
So, in Nicholas' story, a new policy could be established allowing Aloisa and Naresh to decline participation in most meetings that take them away from the team room. They would still be "distracted" working on their own stories, but they would stay with the team most of the time, being able to put other work on hold to focus on code reviews at a moment's notice. Or perhaps they could decide that, for very important meetings, only one of them can attend, making sure at least one of them is available for code reviews.
It could also be decided that Naresh and Aloisa will attend those meetings only if no stories are expected to be finished at that date, or that they will attend but they could be asked to return to the team room if they are needed for a code review. Such a policy could be interpreted as a "subordination" strategy.
A subordination strategy refers to actions taken in places other than the bottleneck in order to make this other parts dependant (or subservient) to decisions made at the bottleneck. Essentially, we choose to prioritize the work at the bottleneck, under the understanding that non-bottleneck parts of the process have additional capacity (or slack) and they can take the hit.
Less "aggressive" than all of the above, Nicholas could also take advantage of the newly visualized queue of stories waiting for a code review to agree with Aloisa and Naresh to always check the content of that column on the board when they come back from a meeting, and to give code reviews the priority before going back to any other work. In general terms, establishing a (visible) "buffer" before the constraint can help set policies to better manage the constraint and make sure it's capacity is optimized.
Another common strategy to deal with an NIA situation is availability splitting. Imagine the case of a specialist that is available to work with a team only 6 hours per week, and that it was decided that this person will join the team on Fridays. This means that any work that requires that specialist before Friday will have to wait until then, creating an NIA situation and potentially a delay. An alternative arrangement could be to ask this specialist to split the 6 hours in two (or more days). This will not increase availability, but it can reduce total delay for some work items.
Both buffering and availability splitting are common strategies that are specific for NIA situations, and they could be interpreted as a protection or subordination actions, depending on the context.
Elevation
But Nicholas could also inquire about the reason why code reviews were dependant on Aloisa and Naresh. As it turns out, the reason wasn't related to seniority and expertise, but something more "pedestrian": for security reasons, this organization had a general policy about granting some permissions to the source control tool only to full time employees, and that included the access to functionality to merge code after a code review. In Nicholas' team, all the developers were contractors, except Aloisa and Naresh, who were full-timers; the organization's security policy, once combined with meeting schedules, had the unintended consequence of creating a bottleneck and a NIA problem.
So, a more permanent solution to the problem could be to change the security policy and grant merge permissions to all developers. That way, they could all participate in code reviews, removing the dependency on two, very busy and unavailable people. This kind of strategy could be considered an "elevation" strategy, and it should also be apparent why ToC recommends attempting them after exploring other options.
An elevation strategy refers to actions intended to solve constraints by generating additional capacity. Examples of this include hiring more people, acquiring additional equipment or tools, automation, or (as in Nicholas' story) changing organizational policy to expand operational capacity. All these actions tend to be expensive, difficult to do, and/or time consuming, and this is the reason why it's usually more practical to consider them last. They often help to really solve the constraint problem, though, so we shouldn't interpret that it's not worth considering them, or even to have some plan to move in that direction eventually; only that it's usually impractical to start there.
So, what should Nicholas do?
As with any other form of improvement, we need to look at the larger context: what's the systemic impact of the strategy we choose? For example, if Aloisa and Naresh stoped attending those other meetings to focus on team work, would that delay some other activities? What would the impact be on Nicholas' team?
Another consideration has to do with the effort of implementing the improvement strategy. A change in security policies would enable higher cross-functionality and solve the problem at its root, but it may require convincing someone way outside the team's area of influence, which may require significant social capital, not to mention lots of time.
Finally, we know from ToC that removing a bottleneck will reveal a constraint somewhere else in the process. With code reviews happening more fluidly, the testing activity may now experience a higher volume of work and become overburdened. Could this effect be worse to overall flow than the delay that we're seeing today, with stories queuing up for testing?
As usual, you need to pick your battles. Understanding what different sources of delay you're exposed to, and their impact to overall delivery will allow you to make an informed decision.
This article is part of a series on Sources of Delay. Find the main article, and links to other parts here.
Comments