Python Application Deployment with Native Packages

After I’ve told you what not to do, I’d like to introduce you to the method we use to deploy a wide variety of services.

Preamble & Disclaimer

I understand your expectations are high, given the amount of feedback I have gotten. However deployment is a highly individual process and turnkey solutions aren’t really possible and/or useful. So in contrast to the previous article, this one is an advanced topic for a more advanced breed of DevOps. In other words: you should already have some experience with deploying under your belt to really benefit from this article.

To avoid excessive length, I’ll assume you’re at least loosely familiar with Fabric. If not, please try to get a rough grasp of it first. Also, I won’t be able to dive into Salt, Puppet or Chef. Please use this article as a starting point – it won’t end up being all-encompassing. I sincerely hope to get you started though, if you consider doing this kind of deployment.

And lastly, to reap all of the benefits, you’ll need to run a private debian repository server for your packages. That’s not hard, but it takes some effort. Fortunately, you can avoid running your own debian repository and still gain most of the advantages: a debian package (or rpm package for that matter) can also be installed by hand using dpkg -i your-package.deb (rpm -Uhv your-package.rpm).

If you want to go really light (or don’t have sufficient privileges on the production servers to install packages), you can employ most of the guidelines here except using vanilla archives instead of system packages and do the work of the package manager using custom tooling. The key point I’m trying to make that the best way to have painless and reproducible deployments is to package whole virtual environments of the application you want to deploy including all dependencies but without configuration. How you achieve this goal is up to you, your requirements and use cases.

Why Native Packages at All?

Both in public discussions as well as privately by mail, one of the most frequently asked questions was:

what’s wrong with Fabric+git-pull?

So let me clarify that first.

It doesn’t scale. As soon as you have more than a single deployment target, it quickly becomes a hassle to pull changes, check dependencies and restart the daemon on every single server. A new version of Django is out? Great, fetch it on every single server. A new version of psycopg2? Awesome, compile it on each of n servers.

It’s hard to integrate with Salt/Puppet/Chef. It’s easy to tell Puppet “on server X, keep package foo-bar up-to-date or keep it at a special version!” That’s a one-liner. Try that while baby sitting git and pip.

You have to install build tools on target servers. GCC and development files don’t belong on production servers. Not only are light weight systems better manageable and faster to set up, it’s also a security feature: Many attacks require a working C compiler.

It can leave your app in an inconsistent state. Sometimes git pull fails halfway through because of network problems, or pip times out while installing dependencies because PyPI went away (I heard that happens occasionally cough). Your app at this point is – put simply – broken.

Rollbacks. Rolling back a git deploy is fairly easy (git reset --hard) but what about the virtual environment? What if dependencies changed? Re-creating the virtualenv on n servers is both time consuming and annoying. To avoid it you’ll have to resort to making backups of your source/venv tree or even file system snapshots).

To summarize: there are too many moving parts. Since deployments are the only stage of development that affects our customers, I want as few moving parts as possible so the process is fast, predictable and easily reversible.

On the other hand, deploying using self-contained native packages makes the update of an app a near-atomic, predictable operation. Rollbacks can be done easily by installing an older package version. You always know in what state your application is right now. You need to update an app on many servers? Build once, let Puppet deploy everywhere. No compiling of any dependencies, no compilers or development packages at all on production servers.

Some of the problems mentioned above can be mitigated by running a private PyPI server – which you should do anyway. Nevertheless, in the grand picture, that’s just a short term hack. Dan Bravender also wrote an article how they overcome some of these problems; so if you still think Fabric-based deployments are the way to go, learn how to do it properly from him. For me, his approach has way too many moving parts just to avoid to build a package which takes a few minutes if you do it properly – but your mileage may vary so make it an informed decision.

That said, if you have one app on one server and you know it will never change (although people tend to err here), feel free to keep it simple until you have a real need. That’s the reason why I gave context about my work in the previous article. Some points may be anti-patterns, however you may get away with them if your situation is different from mine.

What a Deployment Looks Like in Practice

Before I dig into the actual packaging code (I will later in this article, I promise!), let me show you the end result using a simple Twisted application which is our whois server for ICANN domains (like .com).

Every application we deploy has one “fabfile.py” that describes the build process, one “requirements.txt” containing all of it run-time requirements, and “postinst” and “prerm” scripts. The latter are debian/Ubuntu specific and are executed after an installation/update and after a uninstallation or before an update (please note that there are more possible scripts and we also use them but I try to keep this article simple). After months of refining, all of them look really simple.

At the top level, all I do to build a new debian package of the app I’m active on now, is fab deb. Typically, this run takes from 30 seconds to 2.5 minutes – depending on the amount of dependencies that have to be processed before the actual packaging.

To deploy it to our repositories, I do a fab push. From now on, it can be installed using aptitude install <app> on our servers that carry the necessary apt configuration. That’s also where it gets picked up as soon as Puppet realizes the packages on the production servers are out of date. Usually I trigger at least the first server with a puppet agent --test or aptitude update && aptitude dist-upgrade <app>.

Let’s start going into more details with fabfile.py:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
    from vrmd.fabric.ubuntu.deploy import Deployment


    app_name = 'whois'


    def deb(branch=None):
        deploy = Deployment(
            app_name,
            build_deps=[
                'libpq-dev',
            ],
            run_deps=[
                'supervisor',
                'libpq5',
            ]
        )

        deploy.prepare_app(branch=branch)
        deploy.build_deb()


    def push():
        Deployment(app_name).push_to_repo()

That’s all the programmatic information required to build a deb package. The instantiation of Deployment makes sure all build_deps are present and remembers to set run_deps as package dependencies. In this case we need “libpq-dev” for compiling psycopg2 while building and when deployed, “supervisor” is necessary for supervising (duh!) the daemon.

Deployment.prepare_app() creates the necessary directories on the build server, checks out the desired branch (None means current), creates a virtualenv and populates it with dependencies from requirements.txt. Make sure you use the pip cache so you don’t have to download all dependencies on each build. For examples by adding the following into your ~/.pip/pip.conf:

1
2
3
    [install]
    use-mirrors = true
    download-cache = ~/.pip/download_cache

As a bonus it also fixes the shebangs (“#!”) of all scripts in the virtualenv to point the correct Python path on the target system.

Now, Deployment.build_deb() takes the whole app including the virtualenv, packages it using fpm and downloads it to my local host. The version of the package is the build number – which is just the latest package version in our Ubuntu repositories plus one. finally, Deployment.push_to_repo() takes the now-local debian packages and pushes it to our mirrors.

Want a more involved example? Here’s a Django app including JavaScript minification, LESS compilation, i18n translation and several sub-apps:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
    def deb(branch_e=None, branch_db='master'):
        deploy = Deployment(
            app_name,
            build_deps=[
                'gettext',
                'libpq-dev',
                'lessc',
                'yui-compressor',
            ],
            run_deps=[
                'supervisor',
                'libpq5',
            ]
        )

        deploy.prepare_django_app(branch=branch_e)
        deploy.add_sub_app('legacy_db', branch_db)
        deploy.add_sub_app('django_nav', 'master', 'nav')
        deploy.compile_less(['style.less'])
        deploy.compress_css([
            'bootstrap.min.css',
            'style.css',
        ]
        )
        deploy.add_cache_busting(['base.html'])
        deploy.compile_i18n()
        deploy.collect_static([
                'css/styles.css',
                'js/html5.js',
                'resources/img/favicon.ico',
            ]
        )

        deploy.build_deb()

Most of it should be rather obvious. I’d just like to point out Deployment.add_cache_busting(), which looks for a special string in the supplied files and replaces it with the package version. This makes crazy expiration headers for CSS and JS files possible.

So, what about “postinst” and “prerm”? Let’s start with prerm which is really simple:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
    #!/bin/sh

    set -e

    APP_NAME=your-app

    case "$1" in
        upgrade|failed-upgrade|abort-install|abort-upgrade|disappear|purge|remove)
            supervisorctl stop $APP_NAME
        ;;

        *)
            echo "prerm called with unknown argument \`$1'" >&2
            exit 1
        ;;
    esac

Basically debian boilerplate only. Yep, we just tell supervisor to kill our app in line 9. Done.

postinst isn’t much more complicated, although there may be some catches. Let’s use a simple example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
    #!/bin/sh

    set -e

    APP_NAME=your_app

    case "$1" in
        configure)
            virtualenv /vrmd/$APP_NAME/venv
            supervisorctl start $APP_NAME
        ;;

        abort-upgrade|abort-remove|abort-deconfigure)
        ;;

        *)
            echo "postinst called with unknown argument \`$1'" >&2
            exit 1
        ;;
    esac    

In this case we have two interesting lines (9 & 10). First we recreate the virtualenv so that it matches the Python installation on the target system. After that, we start the daemon. The additional call to virtualenv does not alter the actual packages – it’s only necessary because the path of the virtualenv changed from the buildbot to production.

At this point, all the necessary configuration files are already in place thanks to Puppet, even before this script has been run.

These two scripts and how they interact with configuration management might be the most critical point of the deployments as some packages need special treatment. For example, Pylons and Pyramid apps have to be python setup.py install’ed. Also, depending on your uptime/HA needs, you may not be able to just stop, install and start again – although it’s usually much faster than than a git pull && pip install -U requirements.txt.

I’ve watered your mouth even more and the article is too long already. So let’s move on quickly.

Implementation

I’ll show you parts of my implementation and the reasoning behind it. In the long run, I’ll try to extract reusable parts and open source them so everyone can reuse them. But as I can’t guess when it’s going to be done, I give you this section to inspire you how to roll your own version.

Conventions

We use dedicated VMs for building packages for certain OSs. On these, we expect a user called “buildbot” with no special privileges, virtualenv 1.7 (the version is important because we rely on the --no-site-packages default) and fpm.

We use “vrmd” as a prefix for paths of our apps (for example “/vrmd/whois”) as well for packages (for example “vrmd-whois”).

Every app has its own user with the same name as the app and owns a home directory in “/vrmd/app-name“. This contains at least the virtualenv (for example “/vrmd/whois/venv”) and the app itself (for example “/vrmd/whois/whois” – this “double whois” is necessary as some apps need more than one directory for code or static files).

An example:

/vrmd/
└── whois
    ├── venv
    │   ├── bin
    │   ├── include
    │   ├── lib
    │   └── local -> /vrmd/whois/venv
    └── whois
        ├── setup.py
        └── …

Everything below “whois” belongs to the user “whois” – that’s ensured using Puppet rules.

Setting Up

The key class is Deployment. Its constructor makes sure everything is in place before starting the build:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
    import os.path

    from fabric.api import settings, run, cd, put, get, local, env, with_settings
    from fabric.contrib.files import sed


    class Deployment(object):
        def __init__(self, app_name, build_deps=None, run_deps=None):
            with settings(user='root'):
                run('apt-get update -qq')
                if build_deps:
                    run('apt-get install -qq {}'.format(' '.join(build_deps)))
                v = run(
                    'apt-cache 2>/dev/null show vrmd-{} | '
                    'sed -nr "s/^Version: ([0-9]+)(-.+)?/\\1/p"'.format(app_name)
                )
                self.version = int(max(v.split(b'\r\n'), key=int)) + 1 if v else 1

            self.app_name = app_name
            self.run_deps = run_deps or []
            self.pkg_name = ('vrmd-' + app_name).lower()
            self.base_path = '/vrmd/buildbot/build/{}-{}'.format(
                self.pkg_name,
                self.version
            )
            self.app_path = os.path.join(self.base_path, 'vrmd', app_name)
            self.current_branch = local('git symbolic-ref HEAD', capture=True)[11:]

As you can see, this code alone reeks with company convention and would need some generalization I haven’t done yet. But – you know – YAGNI. :)

First, we update the apt cache so we can determine our build number then we install our build dependencies. Afterwards all kinds of paths and names are set and normalized.

Building the Application

Now let’s look at one of the two key methods: the one that builds the whole virtualenv (and possibly more) so that it can be packaged later:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
    @with_settings(user='buildbot')
    def prepare_app(self, branch=None):
        """Create default directories, create a virtualenv, check out src."""
        run('rm -rf ~/build/vrmd-{}*'.format(self.app_name))
        self.src_path = os.path.join(self.app_path, self.app_name)
        if not branch:
            self.git_branch = local('git symbolic-ref HEAD', capture=True)[11:]
        else:
            self.git_branch = branch
        local('git push origin ' + self.git_branch)
        git_clone(self.app_name, self.git_branch, self.src_path)
        with cd(self.src_path):
            self.git_commit = run('git rev-parse --short HEAD')

        self.venv_path = os.path.join(self.app_path, 'venv')
        run('virtualenv {}'.format(self.venv_path))
        run('{} install -r {}'.format(
            os.path.join(self.venv_path, 'bin/pip'),
            os.path.join(self.src_path, 'requirements.txt'))
        )
        # fix shebangs
        target_venv_bin = os.path.join('/vrmd', self.app_name, 'venv/bin')
        with cd(os.path.join(self.venv_path, 'bin')):
            for script in run('ls').split():
                sed(
                    script,
                    '#!' + os.path.join(self.venv_path, 'bin/(.+)'),
                    '#!' + os.path.join(target_venv_bin, r'\1')
                )

I believe this code is mostly self-explanatory. Here are some less obvious points:

I may expand the explanations here if I get a feeling if something is rather unclear. But I think it’s straight-forward.

Packaging

Now there’s only one thing left, the actual packaging of the deb. Which is – thanks to fpm – really, really simple:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
    @with_settings(user='buildbot')
    def build_deb(self, dirs=['vrmd']):
        """Build debian package."""
        with cd(self.base_path):
            run('mv {} .'.format(os.path.join(self.src_path, 'debian')))
            self.run_deps.append('python-virtualenv')
            deps_str = '-d ' + ' -d '.join(self.run_deps)
            dirs_str = ' '.join(dirs)
            hooks_str = ' '.join(
                '{} {}'.format(opt, os.path.join('debian', fname))
                for opt, fname in [
                    ('--before-remove', 'prerm'),
                    ('--after-remove', 'postrm'),
                    ('--before-install', 'preinst'),
                    ('--after-install', 'postinst'),
                ]
                if os.path.exists(os.path.join('debian', fname))
            )
            rv = run(
                'fpm -s dir -t deb -n {0.pkg_name} -v {0.version} '
                '-a all -x "*.git" -x "*.bak" -x "*.orig" {1} '
                '--description "Automated build. '
                'Branch: {0.git_branch} Commit: {0.git_commit}" '
                '{2} {3}'
                .format(self, hooks_str, deps_str, dirs_str)
            )

            get(rv.split('"')[-2], 'debian/%(basename)s')

One last convention I have to mention: every app has a sub-directory called “debian“. This may contain an arbitrary number of the aforementioned “prerm”, “postrm”, “preinst” and “postinst” shell hooks and that’s why the debian directory is pulled to the build directory in lines 5–7. These files are automagically detected and later referenced using the appropriate command line switches. By the way, building a RPMs is just a matter of changing the fpm call from -t deb to -t rpm and adjusting the shell hooks to RedHat standards.

The only “magic” in this method is rv.split('"')[-2], which makes total sense if you know that fpm returns a string like

Created deb package {"path":"vrmd-whois_42424_all.deb"}

on success. I could use regular expressions or split and parse json – but in this case, just splitting is easier. :)

One particularity I really like is the package description:

Automated build. Branch: master Commit: deadbeef

Using the commit id, we can later check the git history for the exact specifics of this package, and what has happened since then.

Epilogue

And that’s basically all the ropes you need to build your own native package build system! Please let me know if anything is unclear and I’ll expand the article accordingly.

Update (2013-03-08)

By now, we did some refinements to our deployment practices and unfortunately I’ve been way to busy to update the article. But here a quick lists of changes:

← See all posts