Story Points Don’t Work the Way You Think

If you’ve ever been in a backlog refinement meeting (maybe you call them “grooming” meetings instead—tomato, potato), you’ve probably heard more than one conversation that sounds something like this:

Alice: I gave this a 3, but I was between a 3 and a 5. Maybe it’s a big 3.
Bob: I gave it a 5. Maybe it’s a small 5, but I wanted to be conservative.

Alice and Bob and perhaps the other members of the team then spend several minutes debating whether a “small 5” can rightly be considered a 3, and how big a “3” can get before it becomes a 5. The product nerd in this situation has a few options:

Facilitate the conversation until consensus is reached: All team members have settled on either a 3 or a 5 for the story in question. Point the story accordingly and move on.
Throw the rules out the window. Call the story a 4. Fibonacci has been dead for centuries, he won’t know.
Change the way you think about story points.

The problem with option 1 is that it’s a waste of time. Alice and Bob basically agree on the size of the ticket. Splitting hairs about whether it’s more accurately a 3- or 5-point story is not valuable at this point in the process. Estimates have uncertainty built into them by their very nature. And over time, the over- and underestimations will come out in the wash as long as we’re not consistently doing only one or the other (and there are ways to tease out if that’s happening but that’s not what this post is about).

But most planning poker guides (at least the ones that I’ve been exposed to—Mountain Goat Software, for example) will recommend that a team reach consensus before a ticket can be assigned a point value. So to solve this problem it may feel like we have to “break the rules.” That brings us to option 2.

Alice says it’s between and 3 and a 5, Bob agrees that it’s between a 3 and a 5, so it makes to sense slap a 4 on that bad boy and move on.

This is…. fine.

But you lose something by abandoning Fibonacci. Don’t worry, this isn’t mathematically deep. It doesn’t have anything to do with spirals or the golden ratio. The reason we use (or approximate) the Fibonacci sequence is actually simple: as the values get larger, they also get further apart.

What does that do for us? It reflects that as the level of effort increases, so does the degree of uncertainty. We can better understand and predict the difference between 2 and 3 points of effort than between 7 and 8 points of effort, for example. That margin of error is built into this method, and by allowing stories to be given values in between those numbers, we’re lying to ourselves about how certain we are that something is a “4” instead of a 5. The reality is—and here comes the hot take so hold onto your pants—that it doesn’t matter whether something is “actually” a 4. By calling it a “5” we’re not saying it’s going to take exactly some amount of time or be exactly some amount of code. Instead, we’re giving it an approximate level of difficulty, somewhere in the neighborhood of whatever 5 points means to us as a team.

This is option 3.

Rather than thinking of a point value as a discrete point on a number line, we think of it as a bucket that includes ALL the potential values between its two Fibonacci neighbors. So calling something a “3” means it’s anywhere between a 2 and a 5. Calling something a “5” acknowledges that it’s anything between a 3 and an 8, et cetera. I’m not an especially graphically capable person but here’s a diagram I drew with my mouse that hopefully illustrates what I’m talking about:

Big disclaimer: For the sake of this diagram I’m calling 13 points the upper threshold for story size but your team should use whatever makes sense for you.

In practice, this is a reversal of the method #2 from the beginning of this post, where teams use Fibonacci numbers for the “poker” phase but can put any story point value on the backlog item. Instead, our refinement goes like this:

We discuss a story, and it’s time to throw points. Here, they are not limited to Fibonacci numbers. If someone is “between a 3 and a 5,” they can throw a 4. If someone is between a 2 and a 3, they can throw a 2.5. There are no limits at this stage. Then, when the poker values are revealed, if all the numbers are in the same “bucket” then we can conclude that we are agreed enough on approximately how hard we think the ticket will be. We give it the point value that matches the “bucket” and move on. No splitting hairs. No “big threes.” No “small fives.” Because “3” and “5” are not mutually exclusive labels but are instead zones that overlap.

Does this mean we’re abandoning the idea of needing to reach a consensus? No, we’re just adjusting the dial on what constitutes “consensus.” No one really knows exactly how hard a piece of work is going to be before they start it, so true consensus is not merely unnecessary, it’s impossible. If two team members both give a backlog item a “5” but one feels it’s a “big” 5 and one feels it’s a “small” 5, is that any more of an agreement than between a “big 3” and a “small 5”? In fact, if your team is using terms like “big” and “small” to add detail to the point values they’re throwing, this reveals that they are already implicitly recognizing those values as ranges, or buckets.

Of course, a team may genuinely be out of alignment how much effort they think a backlog item will take. This method still allows us to identify when that is the case. If the team’s initial estimates do not all fit into one bucket, that signifies that we’re not aligned enough in our understanding, and we should discuss. But the goal of the discussion is not to get everyone to agree to one value, only to converge enough that everyone is in the same bucket (even if they’re sitting on opposite ends of it). Here are some examples:

Backlog Item 1:
Alice: 3.5
Bob: 5.5
Carol: 5
Dylan: 4
These values are all in the “5” bucket. We give Backlog Item 1 a story point value of “5” and move to the next one.

Backlog Item 2:
Alice: 1.5
Bob: 3.5
Carol: 2
Dylan: 2.5
There is no bucket that contains both 1.5 and 3.5. This item requires further discussion as Alice and Bob are not aligned enough in their understanding. As the group discusses, maybe Alice realizes she hadn’t considered the complexity of the item fully, and when the group throws points again, she gives the item a 2. At this point you have to decide in your heart whether 2 is in the “3” bucket, but personally I’d put a “3” on this one and call it a day.

This isn’t just theory. It works. I ran this experiment on a real team last year, and using this method made our refinements significantly more efficient—we spent much less time per backlog item laboring toward an unnecessary level of consensus, and instead reserved those conversations for when the team’s estimations were truly spread out enough to matter. It also increased the team’s comfort with uncertainty, because it gave us a way to acknowledge and communicate about it when it was especially relevant – at the time of estimation.

We waste a lot of time trying to fine-tune our estimations at the beginning of a project. The sliver by which we manage to increase the accuracy of our forecasts (if we increase it at all) is almost never worth the time it takes debating and discussing what are essentially just guesses anyway.

Every Scrum team I’ve ever been on has used the Fibonacci scale (or an approximation) for pointing stories, but until this team and this experiment, we weren’t getting the full value that scale is built to provide. The purpose of refinement is for the team to create a more complete, shared picture of the work involved in each backlog item. By forcing people to pick from discrete values when they throw points, we reduce the resolution of the picture to only a handful of pixels too early, before we even know what it’s supposed to look like. Removing that restriction allows team members to communicate more precisely up front, and converge together on a value that represents the whole team’s shared understanding.

Because in planning poker, showing your hand isn’t how you lose, it’s how you win.

You Might Also Like

Listening to Customers is About Knowing What They Know, Not Doing What They Say

Stop Saying Product Owners are Irrelevant for Clickbait

Stop Treating Your Successes Like Failures

Leave a Reply Cancel reply