11 Oct 2021 Who’s had this problem before?
Long before Shine’s founder and CEO Mark Johnson ascended to the lofty heights of the C-suite (just kidding, he pretty much works at the same desk that he always has), I used to report to him directly at a client site. And whenever our team was sitting around contemplating possible solutions to a thorny technical problem, Mark would often remark:
“Surely somebody must have had this problem before”
It was kind of a half-question, half statement, with an extra emphasis on the word “surely”. After resisting the initial temptation to reference one of the greatest gags in cinema history, we would spend a few moments reflecting and realise that, in our rush to devise our own potential solutions, we had not actually bothered to check whether anybody else had had the problem before. There would follow a scramble for a keyboard, a quick Google search, and the sheepish realisation that yes, somebody had indeed had this problem before (or at least somewhere a lot like it). What’s more, there was usually at least one solution to the problem already available.
I have written previously about the importance of asking “What’s the problem we’re trying to solve?“, as well as “What’s the simplest thing that is going to work?“. In this post I am going to talk about an intermediate question that is sometimes neglected in the rush from problem to solution: “Who’s had this problem before?”. You can ask this question to your immediate peers, your broader organisation, or even just Google. However you do it, it can save you a bunch of effort, and also considerable embarrassment (especially in front of your boss).
Snowflakes and Google
Contrary to what we might think, most of the defects and breakages that we encounter day-to-day are not unique snowflakes. In fact, given the sheer number of developers in the world, it’s highly likely that somebody has had our problem before, or at least something very similar. That person could be sitting next to us, or on the other side of the planet.
However, it can be easy to forget this. Sometimes this is because we don’t like to admit that we don’t know how to do something. Sometimes it’s because we are uncomfortable talking to other people directly (let’s be honest, many of us didn’t get into the programming game so that we could interact with other people all day).
Sometimes it’s habitual, especially for an older programmer like me. I put this down to me having worked before Google was a thing, when we had to figure it out ourselves. To be clear though, I do not consider that to be a badge of honour.
In fact, looking back on the pre-Google era, I’m not sure how we got anything done at all. Instead, I only have vague memories of being blocked by technical issues for days or weeks at a time. There may have been other people in the world with the same problem as us, but like an alien civilisation millions of light-years away, we were doomed to never find them. Instead, all we could usually do was dig aimlessly through documentation and debug our code from first-principles. This process was agonisingly slow.
Then Google came along and all of a sudden we had a way to tap into the collective knowledge and experience of just about every other developer on the planet. It took a while for me to break old habits, but I’m now so dependent on Google that if they decided to shut it down, I’m pretty sure my professional life would be over. It would be like waking up to find that the entire world had shifted to Dvorjak keyboards. In fact, you might as well just turn off the electricity completely.
Fortunately, Google isn’t going anywhere, and I still thrill to the fact that I can dump an error message into it verbatim and have a better-than-even chance of solving the problem on the spot. Whilst this practice may be standard-operation to younger generations of programmers (for all I know, Google is all that they teach at programming school these days), it still gives me a kick.
To demonstrate, let me explain how I recently solved a problem involving Python, Ansible and the AWS CLI, without me really knowing much about Python, Ansible or the AWS CLI.
I first encountered the issue whilst executing a build on a build box. The weird part was that the last time the build had run, it had executed just fine. But it was “legacy” software in the sense that six months had elapsed since that last build, during which time the original
culprits creators who had set it up had left.
Now, without any changes having been made to the code, the build was breaking. So it was left to me, a lowly front-end developer with very little experience with this sort of thing, to fix it myself.
In the build logs, the failure looked like this:
aws s3 cp --recursive packages/client/build s3://some-bucket-ap-southeast-2-82617ab122/branches/develop/1043/client/build Traceback (most recent call last): File "/usr/local/bin/aws", line 27, in <module> sys.exit(main()) File "/usr/local/bin/aws", line 23, in main return awscli.clidriver.main() File "/usr/local/lib/python2.7/dist-packages/awscli/clidriver.py", line 69, in main driver = create_clidriver() File "/usr/local/lib/python2.7/dist-packages/awscli/clidriver.py", line 79, in create_clidriver event_hooks=session.get_component('event_emitter')) File "/usr/local/lib/python2.7/dist-packages/awscli/plugin.py", line 44, in load_plugins modules = _import_plugins(plugin_mapping) File "/usr/local/lib/python2.7/dist-packages/awscli/plugin.py", line 61, in _import_plugins module = __import__(path, fromlist=[module]) File "/usr/local/lib/python2.7/dist-packages/awscli/handlers.py", line 27, in <module> from awscli.customizations.cloudformation import initialize as cloudformation_init File "/usr/local/lib/python2.7/dist-packages/awscli/customizations/cloudformation/__init__.py", line 13, in <module> from awscli.customizations.cloudformation.package import PackageCommand File "/usr/local/lib/python2.7/dist-packages/awscli/customizations/cloudformation/package.py", line 26, in <module> from awscli.customizations.s3uploader import S3Uploader File "/usr/local/lib/python2.7/dist-packages/awscli/customizations/s3uploader.py", line 22, in <module> from s3transfer.manager import TransferManager File "/usr/local/lib/python2.7/dist-packages/s3transfer/__init__.py", line 134, in <module> import concurrent.futures ImportError: No module named concurrent.futures
From this I could tell that some sort of Python-related error was happening when the AWS CLI was run. This was unfortunate for me, because I know very little about Python. Nor were there any Python experts around me. However, as often happens at times like these, a small, insistent voice in my head that sounded a lot like Mark Johnson started saying: “surely somebody must have had this problem before”.
After having a little giggle to myself about that, I did the logical thing and just dumped the last line of the error message into Google. Here’s what I saw:
Sure enough, it seemed that somebody had had this problem before. Furthermore, it looked like some sort of Python compatibility issue. However, I was unclear as to how it related to the AWS CLI. So I refined the search a little:
Yes! Somebody had had this problem before! I was not alone!
I also noticed that the first search result mentioned Pip, about which I also know very little, other than that it’s a package manager. So I did a global search of my codebase for “pip” (global code searches are another powerful debugging tool), and found a YML file with this fragment in it:
… - name: Install awscli pip: name: awscli state: present …
All I knew about this file was that it was used by Ansible. All I really know about Ansible is that it is fashionable amongst the devops crowd. However, whatever Ansible was doing, I was kind of perturbed by the fact that the file didn’t seem to be directing Pip to install a particular version of the
awscli package. Might this mean that the version of the package that would be installed if I ran it today could be different from the version it would install if I ran it six months ago? If so, might this be a possible cause of the problem?
Yep, it was starting to look like this might be a floating dependency issue. However, my lack of Python chops meant that I wasn’t exactly in a position to confidently upgrade us to Python 3.6. But maybe I could just force the version of
awscli to be that which was installed six months ago, when the script last run successfully.
Further down the same page, there was a link to the AWS-CLI CHANGELOG. Long story short: after a bit of sleuthing around I was able to figure out the version of the package that had been in-use at the time of the last successful build, and fix it to that value in the Ansible file:
… - name: Install awscli pip: # Lock to a version that will work with Python 2.7 name: awscli==1.18.188 state: present …
Sure enough the next build ran without incident.
My point here (apart from the fact that you should always fix your dependency versions where you can) is that, knowing very little about Python, Ansible or AWS CLI, I was able to resolve this problem by first dumping it into Google and seeing who else had experienced something like it before. Was it elegant? Nope. What it effective? Yes.
And sure, you may be a Python expert who would have known how to resolve this particular problem without having to Google it. But that’s not the point. The point is that nobody can be an expert in everything these days. It could have just as well been a React error or a Java error. Whatever the error, you’re better off tapping into the collective hive-mind of all developers on earth than trying to resolve every problem yourself.
Fixing a problem by dumping it into Google has one big downside: you mightn’t really know why the solution works. This can lead to Cargo Cult Programming, and might even create more problems for other people in future.
So how much should you know? My basic rule of thumb is that if you can’t write an intelligent code comment about why you’ve done something, you probably need to understand it a bit better. Note, for example, the explanatory comment I added alongside my fix to the Ansible configuration file earlier:
# Lock to a version that will work with Python 2.7
This comment will give some context to the next developer who comes along, rather than them having to scratch their heads and wonder why a specific version number has been provided. They don’t necessarily need to understand the gory process of how we got to that version number, but they probably should be told why it’s there.
Asking the question “Who’s had this problem before?” can save you a bunch of time and effort. You might ask Google the question, or you might even just ask the people around you in your team or organisation. With practice, it’ll become a voice that comes into your head automatically as part of the problem-solving process. That voice may even sound like Mark Johnson.
Finally, don’t worry that asking the question will make you look stupid. Google doesn’t care, and most people don’t care either. The only truly stupid thing you can do is waste time solving a problem that’s already been solved before. That’s something a good engineer will avoid at all costs. Surely.