
26 Jul 2019 My Refactoring Style Is Unstoppable
Gather round disciples, your master is finally ready to reveal the ultimate secrets of the senior developers. Learn how to transform complex, unreadable code into poetry. Or, more accurately, learn what goes through an old developer’s head when he’s trying to clean up some dodgy code.
For the last nine years I have maintained an open-source logging library for node.js called log4js. It is, despite all my attempts to thwart this, quite popular – downloaded nearly two million times a week, used in 600k github projects and has earned 4000 stars. So I’m kind of a big deal. I have many leather-bound books and my apartment smells of rich mahogany.
Many years ago I split off the log file rotation code into a separate library: streamroller. Nine years of pull requests, bug fixes, enhancements, and a more recent merging of two different log rotation methods has resulted in some quite convoluted code in streamroller. It’s hard to read, and it puts off potential contributors – the code is too scary.
All code bases can end up like this, if they’ve been around a while. Bits get added to fix bugs, there’s pressure to release code, people add features without taking a step back and thinking about the design. At some point, you’ll be tempted to rewrite the whole thing from scratch. This is, almost always, a bad idea. I can hear some of you thinking “yeah, other people should never rewrite – but I know what I’m doing”. You don’t. You’re an idiot. You’ll just be exchanging a whole load of known issues for some brand new ones that nobody understands and delaying the release of useful features, but at least you’ll get to massage your enormous ego with your heroic new code.
The streamroller code is 453 lines long, a single Javascript class that implements a WritableStream. You can specify whether you want files to rollover when they get too big, or when the date changes, or – just to make things a bit more complicated – both. This allows you to say in your logging config that you want, for example, log files with a maximum size of 100Mb and you want to keep 5 old files around. Or you want your log files split by day. Or you want your log files split by hour, with a maximum size of 10Mb, keeping a total of 100 log files altogether, and you want them gzip compressed. Streamroller takes care of all that for you, and your code just writes to the stream interface.
Go ahead and take a look at the code as it existed before I refactored it. Scroll around and put yourself in the position of someone that wants to help out on the project. It’s complicated. There’s a class, a whole load of “private” functions (they’ve got an underscore in front of the name) on the class, it’s not clear where you would start. We want to make this easier to understand, how can we do that? Smaller things are easier to fit into your head – can we make this smaller? Or maybe move some things into different files? This is called changing the level of abstraction, but really it’s just about making things fit in your head. When you’re driving your car, you don’t need to know what each piston in the engine is doing – you just press the pedals and go faster.
Ok, let’s dive in and hack away at this dodgy code! Hold on, sparky – we covered this earlier – you’re an idiot. Everyone is an idiot, don’t feel bad about it. Any time you change code, you’ve got the potential to introduce bugs that weren’t there before. How do we guard against this?
Tests. Motherlovin’ tests. Forking shirtloads of tests. Streamroller has 100% test coverage, which gives us reasonable confidence that we will be able to tell if any changes we’ve made have altered the behaviour of the library. If you’re thinking of changing some code that doesn’t have tests, stop. Write some tests that verify that the code does what you think it does first. Measure your coverage.

Ok, we’ve got some tests. Where should we start? We can’t just start at the top and hack our way down, we need to have a well-defined area of code to work on. You need to know when you’re done – otherwise you’re heading down the path of rewriting, and we just talked about that a paragraph or so ago. There’s a couple of candidates: parsing file names (_parseFileName
), and formatting file names (_formatFileName
). They’re both really long functions that take a few arguments and output a value. They’ve both got a lot of branches (if-else statements) that make it difficult to follow. These are both code smells. Read up on them if the idea of code smells is new to you. Don’t treat them as a list of rules that if you follow them then you will automatically get great code, they’re more like warning signs that you might need to step back and think a bit more.
_formatFileName
is used more often in the code, and we have to start somewhere, so let’s focus on just that function for now. Here’s what it looks like:
_formatFileName({ date, index, isHotFile }) {
debug(
`_formatFileName: date=${date}, index=${index}, isHotFile=${isHotFile}`
);
const dateStr =
date ||
_.get(this, "state.currentDate") ||
format(this.options.pattern, newNow());
const indexOpt = index || _.get(this, "state.currentIndex");
const oriFileName = this.fileObject.base;
if (isHotFile) {
debug(
`_formatFileName: includePattern? ${this.options.alwaysIncludePattern}, pattern: ${this.options.pattern}`
);
if (this.options.alwaysIncludePattern && this.options.pattern) {
debug(
`_formatFileName: is hot file, and include pattern, so: ${oriFileName +
FILENAME_SEP +
dateStr}`
);
return this.options.keepFileExt
? this.fileObject.name + FILENAME_SEP + dateStr + this.fileObject.ext
: oriFileName + FILENAME_SEP + dateStr;
}
debug(`_formatFileName: is hot file so, filename: ${oriFileName}`);
return oriFileName;
}
let fileNameExtraItems = [];
if (this.options.pattern) {
fileNameExtraItems.push(dateStr);
}
if (indexOpt && this.options.maxSize < Number.MAX_SAFE_INTEGER) {
fileNameExtraItems.push(indexOpt);
}
let fileName;
if (this.options.keepFileExt) {
const baseFileName =
this.fileObject.name +
FILENAME_SEP +
fileNameExtraItems.join(FILENAME_SEP);
fileName = baseFileName + this.fileObject.ext;
} else {
fileName =
oriFileName + FILENAME_SEP + fileNameExtraItems.join(FILENAME_SEP);
}
if (this.options.compress) {
fileName += ZIP_EXT;
}
debug(`_formatFileName: ${fileName}`);
return fileName;
}
To help us focus, to help us fit this code in our heads, and to reinforce the idea of abstracting out this chunk of code let’s just copy and paste this code into a new file. Call it fileNameFormatter.js
. Don’t worry that it isn’t valid javascript now, we will sort that out in a minute. Leave the original function where it was for now, we’ll work out how to integrate our new code later on. By working on it in a separate file we know we can always go back to the original easily.
The first thing we need to do is write some tests that cover just what this function is meant to do. We already have tests that cover what the whole class is meant to do, but nothing specifically for this function – it was a private function of the class, and we don’t write unit tests for private functions. The class tests will help us when we come to replace the old function with the new one, but we need unit tests for this function too. Here’s what the first of those tests looks like:
it("should take an index and return a filename", () => {
fileNameFormatter({ index: 1 }).should.eql(
"/path/to/file/thefile.log.1"
);
});
We’re going to need a whole lot more tests, but this first one will help us work out how to structure the code now that it is not part of a class. (If that code looks weird, I’m using mocha and should). If we take a look at the code, we can see that it takes a lot of options from the instance variables of the class: this.options.pattern
, this.options.keepFileExt
, etc. Those aren’t available now, so there are two ways we could pass them in. We could pass them in as arguments every time we call the function, but that’s going to make the argument list pretty long (another code smell). These values don’t change once the class instance has been created, so if we could pass them in once somehow that would be better. Here’s how to do that:
module.exports = ({
file,
keepFileExt,
pattern,
compress
}) => {
return ({index, date, isHotFile }) => {
// the original _formatFileName function goes here.
};
};
We’ve made the module export a function which accepts the various set-once options, and return a function which can be called with the arguments that vary. The only change we’ve made to the _formatFileName
function so far is to replace this.options.pattern
with pattern
(for example).
Next step is to update the test code to call this new function module.
describe("fileNameFormatter", () => {
describe("without a date", () => {
const fileNameFormatter = require("../lib/fileNameFormatter")({
file: {
dir: "/path/to/file",
base: "thefile.log",
ext: ".log",
name: "thefile"
}
});
it("should take an index and return a filename", () => {
fileNameFormatter({ index: 1 }).should.eql(
"/path/to/file/thefile.log.1"
);
});
});
});
Now we keep adding tests until we have 100% coverage of this new code. Make sure the tests pass of course, without changing the original part of the function. Your tests are there to verify that you understand what the code is doing now. Only once we’ve got that safety net in place can we start changing the original part of the function.
From the test snippet above we can see that the fileNameFormatter takes a base filename (parsed into an object by node’s path.parse) and adds an index onto the end. Depending on the initial options and arguments the file can have the following components:
<dir>/<base>.<ext>.<date>.<index>.<gz>
If the keepFileExt
option is true, the file would look like this:
<dir>/<base>.<date>.<index>.<ext>.<gz>
Looking at the code, we can see that most of it involves adding parts to the filename string, if certain conditions are met. The gzip extension, for example, looks like this:
if (compress) {
fileName += ZIP_EXT;
}
The code would be easier to read if we could get rid of the if-statements. Let’s take out that gzip part and put it in a separate function.
const gzip = f => compress ? f + ZIP_EXT : f;
This function takes a string (f) and if the compress option was true it adds the “.gz” extension to the string, otherwise it returns the string unchanged. Okay, I know we haven’t really gotten rid of the if-statement we’ve just hidden it in the ternary operator, but bear with me. Now we have a function we can call every time in the main part of the code, regardless of whether compress
was true or not. The if-statement above now becomes:
fileName = gzip(fileName);
This looks promising. We may be able to replace this big function made up of nested if-statements with a series of functions applied to a string to build up the filename from discrete parts. Something that would look like this:
let fileName = dir(file.dir); // /path/to/file/
fileName = base(fileName); // /path/to/file/thefile
fileName = ext(fileName); // /path/to/file/thefile.log
fileName = date(fileName); // /path/to/file/thefile.log.2019-07-25
fileName = index(fileName); // /path/to/file/thefile.log.2019-07-25.1
fileName = gzip(fileName); // /path/to/file/thefile.log.2019-07-25.1.gz
That doesn’t look quite right. There’s a fair amount of repetition there. Could we do this with a loop?
let fileName = "";
const parts = [ dir, base, ext, date, index, gzip ];
for (let i = 0; i < parts.length; i += 1) {
fileName = parts[i](fileName);
}
return fileName;
That might work. We don’t have to know for sure that it will, we can experiment. Let’s extract the other functions with this pattern in mind and see what the code looks like. As we do it, keep running the tests, make sure we’re not straying too far away from having things working and keep our goal in mind which is to make the code easier to understand and easier to fit into our heads.
Next let’s look at the file extension. The rules around this are that we always add it to filename, but the location of it in the filename changes depending on the value of the keepFileExt
option. If it is true, then the extension goes towards the end, just before the gzip part. If false, then it goes before the date and index parts, just after the base file name. We could put all that logic into the ext
function, but the loop style we’ve got means we can do this instead:
let parts = [ dir, base, ext, date, index, gzip ];
if (keepFileExt) {
parts = [ dir, base, date, index, ext, gzip ];
}
We just change the order of the array of operations. This means that the ext
function becomes really simple:
const ext = f => f + file.ext;
The rules around the index
and date
parts of the filename are a bit more complicated, so I’m going to gloss over them and skip to the results of our refactoring.
module.exports = ({
file,
keepFileExt,
needsIndex,
alwaysIncludeDate,
compress
}) => {
// I ended up combining the dir and base functions into one
// and it's always at the start and doesn't depend on parameters
// so I just made it into a string, not a function.
const dirAndName = path.join(file.dir, file.name);
// ext stays the same as we had above
const ext = f => f + file.ext;
// index and date functions depend on the two parameters that
// _formatFileName originally used (index and date)
const index = (f, i, d) =>
(needsIndex || !d) && i ? f + FILENAME_SEP + i : f;
const date = (f, i, d) => {
return (i > 0 || alwaysIncludeDate) && d ? f + FILENAME_SEP + d : f;
};
const gzip = (f, i) => (i && compress ? f + ZIP_EXT : f);
// here I'm hiding an if-statement in a ternary again
// but the idea is the same as earlier - change the
// sequence of operations.
const parts = keepFileExt
? [date, index, ext, gzip]
: [ext, date, index, gzip];
return ({ date, index }) => {
debug(`_formatFileName: date=${date}, index=${index}`);
// I've replaced the for-loop with the reduce function.
// reduce takes an array, applies a function to each element
// and combines the results into a single value.
// Which is exactly what we want to do. We use the value of
// dirAndName as our initial starting value, and then add
// all the parts on in order. We're also taking advantage of
// the fact that javascript ignores extra arguments passed to
// functions, so we can call all the parts the same way.
return parts.reduce(
(filename, part) => part(filename, index, date),
dirAndName
);
};
};
This is a lot easier to keep in our heads. The individual functions are really simple and easy to reason about. The pattern of applying those functions in sequence is again much easier to understand.
The last part of the exercise is to change the original class to use our new functions instead of the _formatFileName
function. Again, we run all those tests to make sure we haven’t changed the behaviour. I won’t show that because it’s pretty straightforward.
Conclusion
We took a long complicated function, broke it down into smaller functions so that we could fit it all in our tiny little heads. The example above is javascript, but the ideas can be applied in any language. If something is too hard to understand, break it down into smaller parts that you can understand. Read up on code smells, so you can skip past needing years of experience to recognise bad code. Make sure you’ve got tests, because then trying things like this out becomes a lot less stressful.
Main photo by Craig Adderley from Pexels
No Comments