NPS: The F—/Marry/Kill of Customer Satisfaction
Title censored out of respect for corporate email filters.
A few years back, NPS came up in conversation. (It was a wedding, so, you know, naturally.) The person I was talking to scoffed and said he’d run an experiment that showed NPS was nonsense.
He explained that he had surveyed his customers, asking if they would recommend his platform to friends or family, and then tracked how many went on to use the recommendation tool on his platform. None did.
Clearly NPS fails, he said. It tries to measure likelihood to recommend, but demonstratively doesn’t.
I didn’t know what to say. To me, this was like saying that a pizza fails because it’s very poor floor insulation. To me, it’s not what the thing was supposed to do in the first place.
NPS (Net Promoter Score) is one of those marketing fads that went through a classic cycle:
Someone has a good idea.
It gets praised for being clever.
Everyone starts misusing it.
It gets criticised for being idiotic.
Unlike many of these fads, however, it has persisted in one form or another through to today. Whether we answer or not, most of us are regularly asked that tedious question: “On a scale of 1 to 10, how likely are you to recommend our product/service?”
There is plenty to criticise about NPS, but a lot of criticisms of NPS are actually criticising mistaken ideas about NPS. So let’s get the essentials clear first.
Net Promoter Score consists of a two-question survey, a particular calculation of the results, and a process of continual improvement based on the results.
The Two-Question Survey
The first question in the two-question survey is roughly: “On a scale from 0 to 10, how likely would you be to recommend our product/service to a friend, colleague or family member?”
The second question is: “What is the main reason for your answer?”
The Particular Calculation
The responses are divided into three bins:
People who answered from 0 to 6 are called Detractors (the lowest 60% of the scale)
People who answered 7 or 8 are called Neutral or Passive (the next 20% of the scale)
And the people who answered 9 or 10 are called Promoters (the top 20% of the scale)
NPS is calculated by subtracting the proportion of Detractor responses from the proportion of Promoter responses. That’s why it’s called Net Promoter Score. If you’ve got more Promoters than Detractors, your NPS is positive, and vice versa. Passives have no impact.
(So if you had 25% Promoters, 65% Passives and 10% Detractors, the net score is 15. That’s 25 minus 10.)
The Process of Continual Improvement
Finally, as with any kind of reporting or tracking, NPS is pointless unless it informs some kind of action. So the idea is you do three things:
Look at your Detractors’ answers to the second question – these are the big failings in your product/service to fix and turn Detractors into Passives.
Look at your Promoters’ answers to the second question – these are the big strengths in your product/service to build on and turn Passives into Promoters.
Look at your competitors’ customers’ answers to those questions, both Promoters and Detractors, to help inform your competitive strategy.1
Okay, so that’s NPS. Let’s look at two particular innovations behind it: the phrasing of the question and the evaluation of the scores.
First, the phrasing of the question.
NPS was popularised by The Ultimate Question by Bain&Co’s Fred Reichheld. Part of the background of NPS is the particular question format “how likely would you be to recommend…?” in comparison with other customer-satisfaction survey questions.
According to Fred, the various questions were evaluated in terms of correlation with subsequent business success (growth). The most common question at the time was, “On a scale from 1 to 10, how satisfied are you with our product/service?”
What they found was that (1) people responded differently to differently worded questions, and that (2) performing well on the “would you recommend” metric was a more reliable indicator of future business growth than “how satisfied are you?”
Now, Reichheld does spend half the book waxing poetic about how wonderful word-of-mouth is and how much more people trust recommendations from peers over the claims of advertising, etc. And presumably he believes that businesses which perform well on this question are more likely to actually get recommended.
But he does not claim that it’s actually a direct measure of how many customers will actively do the recommending. What he claims is that it’s the question with the closest correlation to future growth. And… not even that. Reichheld also (admittedly briefly) comments that entirely different questions might be better for different kinds of businesses.
The question isn’t magic. The real idea is that you can measure customer experience more accurately with a proxy question rather than asking directly.
Second, the evaluation of the scores.
And this really is quite clever.
Reichheld observed that people don’t really think linearly about how they rank things out of 10. If they did, you’d expect a rating of around 5 for an average expected customer experience. But either…
People are kinder or more polite than that;
They see an “average expected customer experience” as pretty good;
Perhaps they expect a pretty good experience and so consider “getting what they expected” to be pretty good.
In any case, when a customer experience is neither good nor bad, nothing to write home about, people tend to give a rating of 7 or 8 out of 10.
In other words, a rating of a 5 or a 6, while technically in the middle of the scale, is considered by Reichheld to be a Detractor. If someone only gives a 6 out of 10, they have probably had a shitty experience in some way and are being polite about it. If someone gives a 9 or 10 out of 10, they probably had a good experience in some way.
So there’s actually two things going on here. The first is recognising that humans don’t rate average experiences with average ratings. The second is the idea of bucketing the respondents into discrete groups on the basis of their ratings on a continuous scale.
Either of these could be criticised, though I do think it’s pretty clever.
One criticism that involves misunderstanding the system is this:
“Oh, a tiny 10% shift downwards of everyone’s ratings from 7 to 6 would change the NPS from 0 to -100.” Or similarly, “A tiny 10% shift downwards of everyone’s ratings from 9 to 8 would change the NPS from 100 to 0.”
The ratings are not a linear evaluation of the experience. There is no “tiny 10% shift”. If someone goes from being someone who gives a 7 or 8, to being someone who gives a 6 or less, then the system assumes they’ve gone from being someone who had an expected experience (Passive) to someone who had a negative experience (Detractor). Even if that’s identified through a “small” shift from 7/10 to 6/10 ratings, it’s a big difference in customer experience.
And, again, being a Promoter does not actually mean that you’ll certainly promote/recommend (though you’re probably more likely to). It’s a measure of customer experience that is, in theory, more accurate than directly asking about customer experience.
An analogy for this struck me a few weeks ago when I was listening to the podcast Hey Randy on CBB Presents. They were playing Fuck/Marry/Kill and they… I mean, it was hilarious, but retelling funny moments on podcasts doesn’t work – believe me, I’ve tried.
If you’re not familiar, Fuck/Marry/Kill is a game, usually played while drinking. It is, upon the slightest reflection, kind of horrifying. Everyone is given three names and each player has to decide who they would sleep with, marry or kill, among the three.
The first reason this is a useful analogy for NPS is because the answer to the F/M/K question is almost never enough by itself. It is always followed by that critically important second question: WHY?!
The second reason it’s a good analogy is because everyone recognises that the F/M/K question is not actually asking about plans to murder, propose or… proposition. No one checks in a month later and says, “Oh wow, Suzie said she would marry Frank and kill John, and John’s still alive and Suzie and Frank aren’t engaged, so her answers were all lies and we learned nothing from them.”
That is, we learn from people’s answers to these questions even though they have no intention of following through. And in fact, we may learn a lot more with that question than we would if we asked, “Rank these three people in order of least liked to most liked.”
Because whoever you like the least, that’s who you’d kill. And then whoever you like the most, that’s who… Well, at that part, perhaps the analogy breaks down a bit. It’s never quite clear in F/M/K whether marriage is the Promoter grand prize or merely the Passive also-ran.
In any case, the point is that the questions are informative without being directly predictive of the behaviours asked about (referrals, murders, etc.) and that the real info comes from asking “why?”
Paul Shale once told me that NPS was “qual disguised as quant”, which I thought was very clever and immediately stole for myself. What wasn’t obvious to me was whether this was praise or condemnation. Depending on how NPS is used or misused, for me it could be either.
It goes wrong when the emphasis is on the disguise: quant. Outside of context, the NPS number alone is dangerously uninformative. As a score, it needs to be comparative – comparing to our own performance in the past and/or comparing to competitors now. It can’t really be compared outside of a category, because people feel generally different about, say, banks compared to how they feel about beers.
The score should also be understood in terms of those contributing factors – Promoters and Detractors. You could get a score of zero by providing an unremarkably good enough experience to 100% of your customers. You could also get a score of zero by having half of your customers delighted and the other half horrified by their experience.
The timing of the question also skews responses. One common practice is following up a customer-service interaction with the questions. Clearly the majority of people who will take the time to answer are those with either very good or very poor experiences, perhaps more likely the latter.
Really, to compare sensibly with the past or with competitors, it should be treated like a brand-health dip – annual or semi-annual, revealing what is most memorably good or bad about each brand in a competitive set rather than what is most recent.
But by far the most common way of fucking up this “qual disguised as quant” is forgetting the qual entirely – either by doing nothing with the responses to that second question, “What was the main reason for your answer?” or not even asking it in the first place.
As I said, there is plenty to criticise about NPS. Its biggest drawcard is probably its simplicity rather than any magical correlation between the score and business growth. And it has become so popular that overfamiliarity with the question itself may be skewing whatever predictive power Reichheld originally asserted. And there are all kinds of other factors which can skew ratings of brands – for example, people tend to rate brands more favourably if they’re perceived as being more popular.
(Some might go even further and suggest that both high NPS and market penetration are caused by straight up brand recognition.)
But you’d be surprised how many businesses who thought they’d implemented it are sitting on goldmines of qualitative responses which are ignored in favour of a monthly “our NPS is up by three points”.
I actually don’t recall whether or not Reichheld recommended this in his book, but to me it’s an obvious one.