2.5 years, 5 developers, 1 Django upgrade

This post originally appeared on the Texas Tribune tech blog

For the past two and a half years, our core Texas Tribune website has been running on Django 1.5. In the meantime, the Django project has made rapid strides; it’s now all the way up to version 1.9, and has brought scores of new features, like native migrations, better testing, and exciting APIs. It has also incorporated plenty of performance and security features and enticing new libraries.

Meanwhile, we at the Tribune had been stuck on 1.5. We wanted to upgrade, of course, but with each passing feature and change in personnel, the prospect became increasingly daunting. Our core product had a mix of our site, CMS, data apps, low-level customizations, and (of course) many third-party libraries. Some of these, including the Tribune’s own Armstrong project, have fallen behind the Django update cycle. Others had updated versions, but we’ve been afraid to touch them for fear of incompatibilities. And our developer time was split by demands from our editorial, news apps, and business teams alike.

So by the time we got to developing a plan for upgrading, it had been over a year, half of our team had turned over (now consisting of Amanda, Chris, Kathryn, and Daniel), and Django was already on version 1.7. If we were ever to upgrade again, we needed a game plan.

Phase one: Diagnoses, Deletion, and Triage

We’d talked about it before, but the first written evidence of attempting to upgrade was in early December 2014. Chris sketched the beginnings of our upgrade workflow. The initial plan was to just try the upgrade and see what broke.

Chris’ first list included just three things:

This seemed manageable, but of course, demons lurked below. Chris passed the task to Daniel, and Daniel soon discovered that two of our Armstrong apps were likewise not compatible with 1.6:

  • Donations, our app for handling member donations
  • Hatband, a tool that brings JavaScript UI enhancements to the Django admin

These projects, like many in Armstrong, have not been actively supported for a while. We made the choice to put the upgrade on the backburner while we figured out what to do with them.

In the meantime, we deleted, deleted, deleted. We deleted so much that we wrote a whole separate blog post about it. Our site has been up since 2009, and we had piles of obsolete and superseded code. What’s more, many of our old data apps and applications still lived in the core repository (although we now start new projects in separate repos). Kathryn and Amanda led the charge in upgrading and deleting scores of apps, and tens of thousands of lines of code.** (rough estimate: 25,000). In the midst of deleting, Liam joined the team to replace Chris, who had left for greener pastures. Liam was grateful to have less code to wrangle off the bat.

Phase two: Armstrong

The Armstrong donations app was old and unsupported; moreover, our spring membership drive was around the corner. The timing was perfect to create a new donations app, which we knew we wanted to separate from our core Django site. So Kathryn built a shiny new app using middleman, now living at support.texastribune.org (feel free to, you know, go there and become a member or donate!).

Armstrong’s Hatband, meanwhile, was used in a small but crucial corner of our CMS, and it depended on a Frankenstein’s monster of old crusty JavaScript libraries. Liam first attempted to hack around it, but the interface was maligned by reporters and our performance was taking a hit from the weighty old tech; he ultimately decided to rip it out, swapping it with a lighter customization of the Django admin.

Phase three: Bigger problems

It seemed like we had cleared the way. We started deploying smaller changes that were backwards-compatible. But once we pushed the upgrade button on our staging site, more tragedy struck:

  • our version of raven (a client for Sentry, our log monitoring tool) was out-of-date…and an upgrade would require an upgrade to our Sentry server;
  • an upgrade was going to break our outdated version of django-compressor; and
  • our core MySQL database was giving some complaint about rolling back transactions…

We hadn’t discovered these because of the differences between local and production environments. The first two were annoying, but the last one turned out to be the worst: we were using a five-year-old MySQL database on a MyISAM engine, which did not support transactions and rollbacks. There was no way it would play well with Django 1.6’s new transactions API. It was time to upgrade our database.

Daniel treated this roadblock as an opportunity, and we ultimately switched our database from MySQL to PostgreSQL. Our reasons and process for this could take up multiple blog posts on its own (we just might do it). For now, suffice it to say that after many weeks of work, test runs, and false starts, we were ready to swap. The first maintenance window didn’t go so well; all of our timestamps were five hours behind, wreaking havoc on our sessions, and we had to roll back. But on Halloween morning, the day after an epic flood in Austin that left Liam without electricity (he watched from a coffee shop), we swapped out our ancient database with a fresh copy of Spooky Postgres.

While Daniel focused on the Postgres transition, Liam spent one sprint upgrading our Sentry server and improving our log monitoring, and another sprint (or two) dropping django-compressor from our codebase, replacing it with WhiteNoise as the manager of our static assets.

Phase four: Finally…

On November 12th, after nearly a year of planning, thousands of hours of typing, and several barrels worth of coffee, we pulled the trigger. It was a disarmingly small code commit, a simple line change in our requirements file. Two or three small bugs emerged on old corners of the site that we hadn’t thought to test, but no big or systemic problems; we spent the next day squashing these bugs, then getting celebratory drinks.

Lessons learned

It took 30 months to get to 1.6, but less than 2 months to upgrade further to 1.7. It was a much less daunting task, not least because we had some upgrade experience under our belts. As we continue to modernize, we hope to take some lessons in tow from our last upgrade process.

Three choices: update, swap, kill

For every old feature or library that was about to break, we were generally faced with the same few choices for how to deal with it: update, swap, or kill. Our rough algorithm for dealing with this, in pseudocode form:

if there is an easy update or fix
    update
else if there is an alternate library
    swap
else
    kill
Rock, paper, scissors
Update (rock), swap (paper), kill (scissors)

In the case of Armstrong components, we chose to kill, and roll our own lightweight replacements. For django-compressor and MySQL, we swapped to WhiteNoise and Postgres respectively. And for Sentry and Reversion (along with many other smaller Python libraries), we updated. These decisions aren’t always easy, but we erred on the side of using the most recently updated libraries, and we looked for overlaps with our other goals. Then again, we’re still using the old test runner, so sometimes our solution was “put it off.”

Look for overlaps

Many of these old versions and libraries were, unsurprisingly, in neglected corners of our codebase, where we were previously afraid to touch anything. The drive to upgrade put incentive on developers to go in and clean out old cobwebs. Moreover, many projects served more purposes than merely upgrading; we had wanted to switch to Postgres and drop old Armstrong projects anyway. While they were side-upgrades in our path to our Django mega-upgrade, they improved the performance and usability of our site in crucial ways. Our fresh new Sentry and WhiteNoise implementations likewise improved our development workflow, allowing for benefits like integrating error logs with Slack, and streamlining deployment of static assets. Finally, they were also important benchmarks to show to editorial and business staff; progress was being made as a byproduct of these upgrades.

In short, keeping Django up-to-date is not just helpful for its new features, its security, and its performance: it’s also an effective way to audit our codebase for old smells, and provide incentive to developers and managers alike to tackle them.

Upgrade in stages

The majority of our roadblocks to upgrading were due to code deprecations; given the way Django’s release cycle works, this meant that most of the stuff we needed to change for 1.6 would also work with 1.5. So we made 95% of the necessary changes before actually upgrading; this let us identify problems piecemeal. By the time we upgraded, we had tackled most of the potential problems already, and the upgrade itself was mercifully anticlimactic.

Some people advised us to rip the Band-Aid off and upgrade three versions of Django at once, rather than just going to 1.6. But the scale of even this single-version change made a multi-version upgrade seem nigh-impossible. At best, we would have been left with smells and deprecations we weren’t aware of. At worst, it would have been daunting to the point of demoralizing.

A quick note on testing: it’s probably impossible to test for every single thing that could go wrong. We have automated tests that helped us identify many early problems. We navigated around every conceivable corner of our site, and went through a checklist of major editorial actions that happen in our CMS. We even tried to run a sample of our access logs through our test site to check for errors. But these still weren’t enough, as we didn’t discover some of the biggest problems until they hit staging, and smaller ones even popped up in production. This might be unavoidable, but the earlier we can poke at the upgrade, the better in terms of knowing what might go wrong.

With these lessons and patterns in mind, we already upgraded to 1.7 and have a more robust roadmap towards upgrading to Django 1.8 and 1.9; we have set up a more rigorous testing plan; and we expect to get there much faster than the 2.5 years that it took us to get to 1.6. For anyone sitting on an old stack of technical debt, be encouraged: it can be done!