You didn’t test the idea. You tested the execution.
These two things are essentially the same:
Disney World Monorail
Chicago L
The Chicago L services nearly 390,000 people per day. The Disney World monorail handles about 150,000 people per day. Both are elevated transportation systems designed to move people in and out of a centralized hub.
No one has ever confused one for the other. The monorail is magical. The L is not.
While not an amusement ride, the monorail is one of the most memorable parts of a Disney World visit. It’s where the magic of the Magic Kingdom starts. Gliding through the Contemporary Resort hotel, that first view of the park from across the lake are moments that stay with people for years.
There is little magic to the L. It does get you efficiently in and out of downtown Chicago. I rode it for many years. The train is often loud, dirty, and overcrowded. It gets the job done, but very few people are sentimental about it.
Two elevated people movers. Same concept. Very different execution.
Product orgs spend a lot of time debating whether to build something — should we build this feature? Will it improve our metrics? Will users want this? They desrisk through testing: put a version out, measure against a control, let the result decide. But very little time is spent discussing the execution of the idea.
I’ll share an example. At a previous company we were focused on reducing inbound call volume to our customer support center. We developed a hypothesis that if we indicated to users that call wait time was high and encouraged them to login instead, users would self-service and thus reduce calls. Straightforward concept and seems reasonable. We ran an A/B test where we placed a small banner at the top of the page that conveyed this message: “We are currently experiencing higher-than-usual call volume. For fastest help, log into your account.”
The test failed. I don’t remember the exact percentages, but I’m sure the difference was small, miniscule. What the results were indicating was: the version of the site that had the warning banner resulted in MORE calls than the version that did not have the banner. Product determined this was a bad idea and chose not to proceed.
But this conclusion doesn’t seem to make sense. A message warning users of a longer wait time drove users to call more? This is not how people behave (and yes, as someone who’s been designing user experiences for decades, I’m aware that people are not always, or rarely logical in their actions).
I would argue the problem with our test was the execution of the idea and not the idea itself. The banner in the test was at the very top of the page, and had a compact vertical height. The font was smaller than our default text and the background didn’t contrast strongly with the background color of the page, meaning it didn’t stand out. The banner was at the very top of the page, above every other element and so very much out of the way. I posit that we did not deploy an effective test of the idea.
I’d like to see product teams spend more time considering the scope of a test that fails: “did we execute this idea well enough to represent the idea?” A/B testing is an important tool in product development, but treating results as purely binary can create false confidence. A weak execution of a good idea can kill something that might actually be worth pursuing.
What level of execution are you bringing to the concepts you want to test and launch?