Adam Kewley

tl;dr: If you find you’re spending a lot of time integrating various pieces of software across multiple computers and are currently using a mixture of scripts, build systems, and manual methods to do that, look into configuration managers. They’re easy to pick up and automate the most common tasks. I’m using ansible, because it’s standard, simple, and written in python.

Research software typically requires integrating clusters, high-performance numerical libraries, 30-year-old Fortran applications by geniuses, and 30-minute-old python scripts written by PhD students.

A consistent thorn in my side is downloading, building, installing, and deploying all of that stuff. For example, on a recent project, I needed to:

Checkout a Java (Maven) project from svn
Build it with a particular build profile
Unzip the built binaries
Install the binaries at a specific location on the client machine
Install the binaries at specific location on a cluster
Reconfigure Luigi to run the application with the correct arguments
Copy some other binaries onto the cluster’s HDFS
(Sometimes) rebuild all the binaries from source, if the source was monkey-patched due to a runtime bug
(Sometimes) Nuke all of the above and start fresh

Each step is simple enough, but designing a clean architecture around doing slightly different permutations of those steps is a struggle between doing something the easy way (e.g. a directory containing scripts, hard-coded arguments in the Luigi task) and doing something the correct way.

The correct way (or so I thought) to handle these kinds of problems is to use a build system. However, there is no agreed-upon “one way” to download, build, and install software, which is why build systems are either extremely powerful/flexible (e.g. make, where anything is possible) and rigid/declarative (e.g. maven).

Because there’s so much choice out there, I concluded that researching each would obviously (ahem) be a poor use of my valuable time. So, over the years, I’ve been writing a set of scripts which have been gradually mutating:

Initially they were bash scripts
Then they were ruby scripts that mostly doing the same as the bash scripts
Then they were ruby scripts that integrated some build parts (e.g. pulling version numbers out of pom.xml files), but were mostly doing the same as the bash scripts
Then they were a mixture of structured YAML files containing some of the build steps and ruby filling in the gaps
Then they were a mixture of YAML files containing metadata (description strings, version numbers), YAML files containing build steps, and Python filling in the gaps because Python’s easier to integrate with the existing researcher/developer’s work

After many months of this, I decided “this sucks, I’ll develop a new, better, way of doing this”. So I spent an entire evening going through the weird, wonderful, and standard build systems out there, justifying why my solution would be better for this problem.

Well, it turns out this problem isn’t suitable for a build system, despite it having similar requirements (check inputs, run something, check outputs, transform files, etc.). Although my searches yielded a menagerie of weird software, what I actually needed was a configuration manager. Ansible being a particularly straightforward one.

This rollercoaster of “there probably isn’t a good solution already available to this problem”, “I’ll hack my own solution!”, “My hacks are a mess, I should build an actual system”, “oh, the system already exists” must be common among software developers. Maybe it’s because the problem isn’t actually about developing a solution: it’s about understanding the problem well enough. If the problem’s truly understood, it will be easier to identify which libraries/algorithms to use to solve it, which will make developing the solution a lot easier. Otherwise, you’ll end up like me: Keeper of the Mutant Scripts.