Out at Shine’s various client sites, our teams often meet to discuss the pros and cons of various technical solutions. And in the past, there was one particular Shine manager who, if he was in attendance, would regularly pipe up and ask the question: what’s the problem we’re actually trying to solve?

It was kind of annoying, to be honest. There we were, busy getting into the deep technical details of a solution, and there he was, wanting to know what the actual problem was. I mean, surely everybody understood what the problem was, right?

Well, sometimes it would turn out that, when pressed, we couldn’t really state clearly what the problem was. Or that different people at the meeting had different ideas as to what the problem was. Or that whilst it might be a problem in future, it wasn’t really a problem right now. Or that maybe it wasn’t really a problem after all, and we’d just wasted half an hour of our time.

Flash forward a decade or so, and I’ve found myself becoming the annoying person who asks this question in solution design meetings. And just like before, I’ve noticed that often, when forced out into plain sight, problems can look radically different to what we thought they would look like, or can even disappear completely.

In this post I’m going to talk about why it’s so important to ask this question, why we tend to shy away from asking it, and how you can go about answering it.

Why it’s important to ask the question

An important part of building software is managing complexity. And one key, ongoing activity when managing complexity is separating the necessary complexity in your project from the unnecessary complexity.

Necessary complexity is inherent to the problem you are trying to solve. It can’t be removed, only minimised.

In contrast, unnecessary complexity is not inherent to the problem. It’s stuff that’s been added that isn’t really required. Not only does it not add value, it makes the necessary complexity harder to understand.

Sometimes, unnecessary complexity is accidental, or a natural part of the learning process. In that case, the key part is that, once we realise it’s unnecessary, we go back and remove it before it gets too hard to manage.

However, sometimes unnecessary complexity occurs because we have tried to solve something that wasn’t actually a problem, or at least not the most important problem right now. This is a form of over-engineering that, at best, is not an efficient use of your time, and at worst, creates cruft that gets in the way when you (or some other unfortunate person) finally gets around to working on the most important problem.

Why we don’t ask the question

I think it’s important to be honest with ourselves as to why we don’t ask the question as much as we should. I think there are two main reasons.

Firstly, software developers love to solve problems. But I don’t think I’m alone in thinking that, without a problem to solve, we can start to feel a little adrift. Like a minion without a master, life starts to feel incomplete, or, dare I say it, empty.

So desperately do we long to solve problems that sometimes, we tend to invent them, or at least inflate their importance. Furthermore, sometimes we are barely conscious of doing this, or tell ourselves stories to justify what we’re doing.

The second reason we tend to not ask the question is that we like to stick to what we know, and want to demonstrate our mastery in a problem domain. This can lead us to have a bias towards solutions that are easier to implement, rather than the solution that might be most appropriate. This is a natural tendency that all humans share, so much so that there’s even a name for it: The Streetlight Effect. To quote:

A policeman sees a drunk man searching for something under a streetlight and asks what the drunk has lost. He says he lost his keys and they both look under the streetlight together. After a few minutes the policeman asks if he is sure he lost them here, and the drunk replies, no, and that he lost them in the park. The policeman asks why he is searching here, and the drunk replies, “this is where the light is”

On top of this cognitive bias, there’s also the fear that we won’t be able to fix the problem, or that it’ll be a waste of our time. And let’s be honest, learning new stuff can be hard. You certainly don’t feel as productive at first, and programmers (or good ones, anyway) love to feel productive.

In our defence, sometimes there might be a better person around to solve the most important problem, or sometimes we might just not be in the best headspace to fix it right away.

For example, if an important (but not urgent) issue comes up on Friday afternoon, but the resident expert has gone home and frankly you’re pretty tired after a big week at work, then maybe it can wait until Monday whilst you spend the remainder of the day working on a less-important problem that you’re more likely to make progress on.

That said, come Monday morning when you’re feeling fresh and ready-to-go, if the expert still isn’t back in the office, it’s time for you to hop out of your comfort zone and have a go at solving the problem yourself.

How to answer the question

So, you might agree that it’s important to ask the question, but be a little unsure how to answer it. Luckily for you, I’ve prepared a simple template for you to use. Ready? Here it is:

The problem is __________________________________

This might seem trite, but it really can be that simple. The key thing is that the answer is usually just the start a conversation. It might be a conversation with your team, or even just with yourself. The aim of the conversation? To establish context and ensure everyone agrees upon what the actual problem is.

Here’s an example conversation that I’ve seen in real life, starting with the following problem statement:

The problem is that we only have one web server

Well, that’s a start, but it sounds a bit like an observation than a problem. So let’s take a step back and ask: why is this a problem?

Because our users are experiencing slow page response times

That’s better. We can now rephrase the problem as follows:

The problem is that our users are experiencing slow page response times

In this case it was actually especially important to ask the question before going into solution mode. Why? Because if you know anything about web development, you’ll know that slow response times aren’t necessarily caused by having a single web server. It might turn out that, rather than having to go to the effort of introducing a load-balanced cluster, a simple CDN layer would do the trick. Or perhaps it’ll turn out that the bottleneck is the database, meaning that even a cluster wouldn’t solve the problem.

Here’s another example I’ve seen before:

The problem is that we don’t have any unit tests

This is what I call a “solution masquerading as a problem”. It’s not so much a problem as a call-to-arms. So before we go into battle, let’s ask the question: why do we think we don’t have enough unit tests?

Because the quality of our code is poor

Once again, it’s more of an observation than a problem. It’s also subjective and difficult to quantify. So let’s dig in further and ask: why do we think the quality of our code is poor?

Because we’ve had lots of defects recently

OK, now we’re getting somewhere. But we still need more information. What sort of defects were they? Were they high-severity production showstoppers that made the app unusable to the user, or annoyances that put noise in our dev environment logs but weren’t noticeable to the user?

High-severity defects

So if I was to restate the problem, it would be:

The problem is that we’ve had lots of high-severity defects go into production recently

Now that we’ve spelled it out like that, this indeed sounds like a problem that needs to be dealt with sooner rather than later. However, once again it’s important that we went through this process because, as it turns out, unit tests mightn’t give you the biggest bang for your buck when it comes to picking up this sort of defect. They mightn’t even pick up such a defect at all.

For example, sometimes high-severity defects come from misunderstandings about the interfaces between components, not the components themselves. In that case, you can get far more value from writing a handful of high-level integration tests, rather than hundreds of unit tests. If, on the other hand, the defects are originating from a single component of the system, it might be worth the effort of unit testing that component in isolation rather than building multiple integration tests.

In short, software developers are great at drilling-in, but not so good at stepping back. Furthermore, to try and short-cut the drilling process, we often make assumptions about causes and effects. But if any of these assumptions are wrong, we’ve effectively drilled in the wrong place, which is a waste of time. So before you put your head down and start drilling, take the time to look around and make sure you (and your team) are actually starting in the right place.

Context Is King

At this point, you might ask: what if everybody in the room already knows, for example, that we’ve had lots of high-severity defects lately, and, for arguments sake, also knows that those defects have actually been originating from a single component? Wouldn’t it then sufficient to open a solution design meeting by stating that the the problem is that we don’t have any unit tests?

Well, yes, but I believe that this sort of shared understanding happens far less than people think. More often than not, people can’t even state what the problem is, let alone agree upon it. This can result in a situation where people are debating solutions to different problems without realising it. Or even worse, everybody agrees on a solution without anybody actually understanding the problem.

This latter scenario sounds ridiculous, but I’ve seen it happen. A meeting finishes with an agreed course of action, then in the following weeks a funny conversation like this can occur:

  • Kevin: Hey, you remember how a couple of weeks ago I said I’d set up that web server cluster? I just did it and it doesn’t seem to have made any difference to our page load time. I did some performance tests and I think the bottleneck might actually be the database. I thought you and Frank said the problem was that we only had one web server?
  • Jane: We only said that because you said you were going to set up a web server cluster. We figured that if you thought it was important enough to create a cluster, that must be the problem.
  • Kevin: I only said I’d set up the cluster because I heard somebody say the problem was that we only had one web server. I guessed that if that was the problem, the solution was to setup a cluster.

So basically what’s happened here is that nobody at the original meeting really knew what the problem was. Consequently, the solution drove the problem, not the other way around. Even worse, nobody really owns the problem, because it’s still not clear what it actually is.

How do you avoid this occurring? I’d suggest that before going into solution-mode, have everybody take some ownership of the problem by agreeing what it actually is. Furthermore, take a step back and try and reword the problem in terms of your customers and end-users. Customers won’t ask for extra web servers, but they might well ask for a website that responds faster. End-users won’t ask for more unit tests, but it’s quite likely they will ask for a system that is less buggy. And if you can’t reword the problem in terms of your end-users on the first go, keep zooming back and re-asking the question until you can.

Let’s wrap this up

What’s the problem that we’re actually trying to solve? is a question that I find myself asking more and more these days. It’s proven to be one the most powerful questions I can ask. And whether it’s been asked by me or somebody else, it’s a question that’s saved my teams from many dead-ends and wild goose chases.

So I encourage you: be the annoying person in the room who asks the question. Indeed, I think it should be one of the first questions you ask. Some engineers learn to ask it early in their careers, whilst others (including myself) have had to figure it out the hard way. But once you start asking this question, it’s hard to stop, because it brings a focus to your design and decision-making process that you won’t ever want to be without again.

Thanks to Gareth Jones for his invaluable feedback on an early draft of this post

One comment

  1. Great point. Wish I had this advice earlier…

    I was experiencing so much unexpected behavior from an iOS device from various origins that it was a struggle to keep up. From the devices OS configuration, to the UI, the server traffic, even web browsing, everything had something go wrong. Cert chain issues, Apple server redirects, evidence of managed device configurations (despite the device being from Userland), Constant UI bugs that were utterly absurd like Alert Dialogues containing literal errors themselves.. I could write a book (My blog will do eventually).

    Call upon call upon call to support, escalated cases, engineers- all failures. Why? Because when I was asked what the actual problem was.. I HAD NO ANSWER.

    Unprepared, speechless, I would end up awkwardly sputtering out the exact opposite of an answer; it was an explanation.

    *Apples latest security disclosures regarding their absolute carnage of internal servers ended up giving me piece of mind. Had I been able to answer the question, I too could of contributed to the security research. Lesson learned.

Leave a Reply to Failappointment, Ltd Cancel reply

%d bloggers like this: