Testing & Packaging

How to ensure that your tests run code that you think they are running, and how to measure your coverage over multiple tox runs (in parallel!).


I used to scoff when I saw Python projects that put their packages into a separate src directory. It seems unnecessary and reminded me of Java. But it also seemed to be dying out.

Imagine my surprise when I saw cryptography – one of Python’s most modern projects – adopt a src directory subsequently (if you’re not surprised, feel free to skip to Combined Coverage)!

Soon after I got a bug report that my tox and coverage setup works only by accident: my tests didn’t run against the version of my app that got installed into tox’s virtualenvs. They ran against the actual directory. Let me reinforce this point once more:

Your tests do not run against the package as it will be installed by its users. They run against whatever the situation in your project directory is.

In general, that’s not a big deal. In the end, it’s the same code. But you can miss packaging issues which are especially frustrating to track down: ever forgot to include a resource (like templates) or a package? It demonstrated to me that there’s likely more that can go wrong than I thought and that isolating the code into a separate – un-importable – directory might be a good idea1.

To achieve that, you add a where argument to find_packages() and tell setup() about it:

    package_dir={"": "src"},

Coping with that rather minor issue exposed me to a more interesting problem: measuring coverage over multiple tox runs.

Combined Coverage

The combo of running your tests against various versions and configurations using tox and measuring the coverage is popular.

In my experience though, most projects only consider the coverage for one version (depending on where you stand on the Python 2 vs 3 debate) and either ignore the coverage for other versions or push them to 100% too using # pragma nocover.

But it’s much nicer to have the combined coverage computed over all your tox runs like codecov will do for you. No more guessing and reasoning; you get an accurate report on which lines and branches have been executed and which not.

As of coverage 4.2, this feat is easily achieved if you run your tests directly against your source directory (i.e. without an src directory in between).

Run coverage in parallel mode:

coverage run --parallel -m pytest tests

And add an environment to your tox configuration that will combine and report the the coverage over all runs at the end2.

envlist = py27,py35,pypy,coverage-report

# ...

deps = coverage
skip_install = true
commands =
    coverage combine
    coverage report

It gets more complicated if your tests run against the installed version of your package though. Because now the paths of the actually executed modules look like


as they should, since we want to test what’s installed by your package, not what’s laying around in your directory!

So you end up with a very long coverage output with all site-packages of all tox environments.

Fortunately there is a solution in coverage which is the [paths] configuration section. It allows you to tell coverage which paths it should consider equivalent:

branch = True
source = attr

source =

Now coverage combine will fold the coverage data of these paths together and you get what you expect:

Name                     Stmts   Miss Branch BrPart  Cover   Missing
src/attr/__init__.py        17      0      0      0   100%
src/attr/_compat.py         15      0      2      0   100%
src/attr/_config.py          9      0      2      0   100%
src/attr/_funcs.py          35      0     18      0   100%
src/attr/_make.py          202      0     92      0   100%
src/attr/filters.py         15      0      3      0   100%
src/attr/validators.py      33      0     12      0   100%
TOTAL                      326      0    129      0   100%

Speeding Up With Parallelization

This approach works but it relies on the order in which the environments are ran.

Thus if you want to use something like detox that runs your tox environments in parallel, you’ll have to make sure that reporting runs separately after all other environments are done.

To illustrate how much of a difference detox makes: the modest structlog test suite in all it’s variations takes about 50s serially and about 20s when parallelized using detox. I think ~100% faster is pretty sweet and worth coming up with a solution3.

So I wrote a simple tool called detoxize that reads a list of environments from stdin (probably tox -l output) and prints a command line that runs detox with all environments except the last, and finally tox with the last environment:

#!/usr/bin/env python3

import sys

if __name__ == "__main__":
    envs = [env.strip() for env in sys.stdin.readlines()]
        "detox -e " + ",".join(env for env in envs[:-1]) + "; "
        "tox -e " + envs[-1]

In other words if you have:

envlist = py27,py35,pypy,coverage-report

in your tox.ini and pipe tox -l into detoxize, it will print

tox -e detox -e py27,py35,pypy; tox -e coverage-report

If you set up a tox project like I’ve sketched out before, you can run

tox -l | detoxize | sh

and you get the benefits of parallel speed together with a combined coverage report. You may want to add a shell alias for that if you’re lazy like me:


alias dt="tox -l | detoxize | time sh"


alias dt "tox -l | detoxize | time sh"

Travis & codecov

Turns out™ that codecov doesn’t work properly if you run coverage in parallel mode. Therefore you have to coverage combine your coverage data between each run and submit:

  - tox -e coverage-report
  - codecov


You can have a look at the setup.py, tox.ini and .coveragerc of attrs if you want a simple yet complete and functional example.


  1. There are more good reasons. ↩︎
  2. Before coverage 4.2, you used to have a second tox environment that cleaned up previous coverage data before you ran your tests. This has been addressed. ↩︎
  3. There’s an open issue on tox’s bug tracker about his. Feel free to vote! ↩︎