Python Application Dependency Management in 2018

We have more ways to manage dependencies in Python applications than ever. But how do they fare in production? Unfortunately this topic turned out to be quite polarizing and was at the center of a lot of heated debates. This is my attempt at an opinionated review through a DevOps lens.

2018 is closing to an end and a lot happened since the best and only way to pin your dependencies was running

$ pip freeze >requirements.txt

pip-tools have been around for a while and automated that very same process including updates.


However, in the past year, two new contenders entered the scene that tried to fix packaging for good:

  1. Pipenv by Kenneth Reitz that has been declared the official recommended packaging tool by the PyPA and moved into the pypa GitHub organization.
  2. Poetry by Sébastien Eustace (author of the in my opinion best Python date library pendulum) that is mostly flying under the radar but has its fans.

With pip-tools still working fine and being maintained by Jazz Band, we now have inevitably more alternatives to manage our dependencies than ever. But how do they compare if your job is to run Python applications on servers?

My Context, or: Putting Python Apps on Servers in 60s

Whether we like it or not, the best Python build fragment for server platforms is the good old venv1 – possibly compressed using tools like pex or shiv.

Such a build fragment can for example be built and packaged using Docker’s multi-stage builds or just tar’ed up and moved to the target server.

Thinking you can cheat yourself out of virtual environments using Docker leads to huge containers that ship with build environments, version conflicts with your system Python libraries, or wonky attempts to move around whole Python installations or user directories that will eventually fall over because some CPython internals changed or pip install --user was never intended for something like that.

Of course, with some dedication you can work around all of the above, but ask yourself why you’re so keen to ditch a reliable and blessed way and what amount of bloat, complexity, and sketchiness is it worth it to attain that goal.

Docker is a great way to distribute virtual environments – but not a replacement of them.

Requirements

What all of this means is that whatever solution I pick, it needs to provide the following features:

  1. Let me specify my immediate dependencies (e.g. Django) for at least two environments (production and development, the latter usually being production plus test tools),
  2. resolve the dependency tree for me (i.e. recursively determine my dependencies’ dependencies) and lock all of them with their versions and ideally hashes2,
  3. update all dependencies in one go and update all locks automatically and independently for each environment3,
  4. integrate somehow with tox so I can run my tests and verify that the update didn’t break anything4,
  5. and finally allow me to install a project with all its locked dependencies into a virtual environment of my choosing.

So for example I use dockerized build servers to create a virtual environment at /app, install the application with all its dependencies into it, and then

COPY --from=build /app /app

it into a minimal production container.

When I’m building Debian packages, I’ll do the exact same thing in the same build container, except the target path of the virtual environment becomes something like /vrmd/name-of-the-app and the whole thing gets packaged using fpm and uploaded to our package servers.


DISCLAIMER: The following technical opinions are mine alone and if you use them as a weapon to attack people who try to improve the packaging situation you’re objectively a bad person. Please be nice.

Pipenv

I usually try to follow community best practices and so I looked at the PyPA-endorsed Pipenv first. One notable feature is that it introduces Pipfile and Pipfile.lock as means to declare and lock dependencies for the first time in a mainstream package5.

And Pipenv tries really hard to do everything for you. It doesn’t just manage your dependencies, it also takes care of your virtual environments, and along the way tries to guess what you might want (like detecting whether you’re in an active virtual environment, installing missing interpreter versions using pyenv, or checking for preexisting requirements.txts).

If you look into Pipenv’s patched and vendor directories, you’ll realize how it achieves that: they took what’s battle tested, put a nice interface in front of it, and patched what didn’t work. In a way, it carries the burden of many years of Python packaging in its guts. And yes: there‘s a patched pip-tools inside of Pipenv.

While being user-friendly and relying on mature work sounds great, it’s also its biggest weakness: Pipenv took a much bigger bite than the maintainers can realistically swallow and it grew so complex that no mortal has a chance to understand – let alone control – it. If you look at the changelog, maintenance mostly means intense fire fighting.


I have used Pipenv for almost a year and still remember the dread of updating it. There were times when each release introduced a new breaking regression – including the last two as of me writing this.

And that were the final two straws that broke my camel’s back. I have really tried to make this work and I have the utmost respect and sympathy for the maintainers that have to fight a hydra of complexity which – if anything – is growing more necks. But I personally have lost the faith that this project ever becomes stable enough to trust my sleep on it.

On backchannels I’ve heard from friends at various $BIG_CORPS that they also had to back away because they ran into blocking bugs that sometimes just got closed without ceremony.


As it stands today, I unfortunately have to disagree with the decision to make Pipenv the recommended way to install and manage dependencies in Python. It is a project that looks great on paper but cannot hold its own weight. I’m afraid a rewrite from the ground up is what it would take to put it on solid footing6.


If you’re more optimistic than I am and want to try it for yourself, Pipenv actually fulfills all needs outlined above:

  1. Install immediate dependencies using pipenv install Django or pipenv install --dev pytest.
  2. They are locked automatically on installation.
  3. pipenv update --dev
  4. While pipenv has suggestions on direct usage with tox, I prefer the old way and transform my Pipfiles into old school requirements.txts. Updating/re-locking all dependencies then looks like this:

       $ pipenv update --dev
       ...
       $ pipenv lock -r >requirements/main.txt
       ...
       $ pipenv lock --dev -r >requirements/dev.txt
       ...
    

    Which can be used in tox as usual:

       [testenv]
       deps =
            -rrequirements/main.txt
            -rrequirements/dev.txt
       commands =
            coverage run --parallel -m pytest {posargs}
            coverage combine
            coverage report
    
  5. Thanks to the approach taken in the step 4, you don’t have to touch your build/deploy system at all and possible Pipenv bugs only affect you in development which is usually a lot less critical.

    I appreciate a lot that Pipenv lets me export the state into a common standard and doesn’t force me to use it throughout all of my workflows.

Poetry

The next project I tried was Poetry which has a very different approach. Instead of reusing what’s already there, it makes a self-written dependency resolver its core. That may sound like a straightforward problem but in reality it’s anything but.

Poetry also embraces the standard pyproject.toml file for both immediate and intermediate dependencies. Additionally it offers first class support for packaging Python libraries and upload them to PyPI. It explicitly does not use setuptools to achieve any of that.

Given that it’s consciously modern and tries to do a more or less clean cut in Python packaging, I am intrigued by this in my opinion under-appreciated project whose biggest sin probably is that it doesn’t have a big name attached to itself. I’m also a big fan of it not insisting to manage my virtualenvs (nothing beats virtualfish!)

Unfortunately it fails in one of my requirements and that’s point 5. As of writing (Poetry 0.12.10), there is no way to tell Poetry to install the current package along with its dependencies into an explicitly specified virtual environment. Mind you, that if you run poetry install within an active virtual environment, it will do the right thing. However that is problematic on build servers and in CI.

Since pyproject.toml is just TOML, it would be trivial to extract a requirements.txt from it (and a future release of Poetry will support it out of the box!). However it’d feel like going against the grain. More importantly, if you go all in on Poetry, you also drop your setup.py. If your Python applications aren’t packages you’re not gonna care, but mine are7 and they become uninstallable.


All of this is possible to work around, but I’m not comfortable to build core workflows around kludges. But unlike Pipenv, I can see Poetry become my tool of choice, because it only lacks one feature and the rest appears to be very solid with a slick UX. Furthermore its scope is narrow enough to give me confidence that the complexity will not get out of hand anytime soon.

So I hope Sébastien will either add a way to install into explicitly specified virtual environments or even offer some first class support for bundling applications into virtual environments, shivs, etc. Then I will check back and report.

pip-tools – Everything Old Is New Again

Turns out, pip-tools still work mostly great! Well, except pip-sync that proved unreliable to me, but pip-compile does the job in most cases. If you want to take the lipstick-on-a-pig approach and don’t mind adding your dependencies into a file by hand, you’re golden:


update-deps:
	pip-compile --upgrade --generate-hashes --output-file requirements/main.txt requirements/main.in
	pip-compile --upgrade --generate-hashes --output-file requirements/dev.txt requirements/dev.in

init:
	pip install --editable .
	pip install --upgrade -r requirements/main.txt  -r requirements/dev.txt
	rm -rf .tox

update: update-deps init

.PHONY: update-deps init update

The downside: pip-tools has no proper resolver. Thus it has to run under the same Python version as the project it’s locking, or else conditional dependencies will not work correctly8. It literally just runs pip install/pip freeze for you with all its up- and downsides.


Between “breaks regularly” and “can’t use” it’s still the “best.” And while it’s not perfect, if it’s good enough for all the companies that I know of, it’s probably also good enough for you, while we’re waiting for the One True Solution™ together.

Post Scriptum

This post is less polished and edited than my usual work. That’s because I started writing it out of frustration, achieved catharsis, and decided to keep it to myself to avoid the Drama Llama. Then the whole Internet started pestering me to send it to them privately. So I did some scanty edits, pushed it out, and went for Glühwein.

Please excuse the rough edges and please excuse the later edits when I added more clarifications because some people didn’t fully understand the requirements I had – it was assuming too much context.

Footnotes


  1. Or of course virtualenv if you still run anything resembling Python <3.3 (legacy CPython, IronPython, Jython, …). ↩︎
  2. I say “ideally” because we run our own PyPI server. If you don’t, you should make it a prerequisite. ↩︎
  3. This is where a plain pip freeze falls short – if you don’t use a fresh environment, you end up with pytest in production lock files. ↩︎
  4. SemVer is a lie. ↩︎
  5. The spec itself is already from 2016 and there at least used to be the plan to add it to pip too. ↩︎
  6. I’ve been told that a rewrite of at least some of its parts is the plan. ↩︎
  7. To be clear: they’re packages in the sense of pip install --editable . in development and pip install . on deployment. They never go to a packaging index.

    Making an application a package has a bunch of benefits but they go beyond this article.

    ↩︎
  8. For example, pytest has the conditional dependency funcsigs;python_version<"3.0". If pip-tools run under Python 3, funcsigs will not ever be added to your lock file.

    Which means that if you use it to lock your dependencies for a Python 2 project and then run pip install in hash mode (which you should), the installation will fail because it didn’t lock the hash of funcsigs.

    And the other way around: if your pip-tools run on Python 2, locking will keep pulling in dependencies that you don’t need in Python 3 projects.

    ↩︎