How to ensure that your tests run code that you think they are running, and how to measure your coverage over multiple tox runs (in parallel!).

src

I used to scoff when I saw Python projects that put their packages into a separate src directory. It seems unnecessary and reminded me of Java. But it also seemed to be dying out.

Imagine my surprise when I saw cryptography – one of Python’s most modern projects – adopt a src directory subsequently (if you’re not surprised, feel free to jump ahead to Combined Coverage)!

Quick interlude from 2021: while there is still disagreement on the src topic1, a significant number of projects moved to src since this article has been published. From the four projects above that I used to demonstrate how uncommon it is, three switched to an src layout (Flask, Pyramid, and Twisted). It has become much more difficult to find a major project with a plain layout outside scientific packages.

Another interlude from 2021: NASA landed another robot on Mars, and they use src directories in their Python code. My job here is done.


Less than a year later, I received a bug report that my tox and coverage setup works by accident: my tests didn’t run against the version of my app that got installed into tox’s virtual environments. They ran against the actual directory. Let me reinforce this point once more:

Your tests do not run against the package as it will be installed by its users. They run against whatever the situation in your project directory is.

But this is not just about tests: the same is true for your application. The behavior can change completely once you package and install it somewhere else.

All of this makes you miss packaging issues which are especially frustrating to track down: ever forgot to include a resource (like templates) or a sub-package? Ever uploaded an empty package to PyPI? That one issue demonstrated to me that there’s likely more that can go wrong than I thought and that isolating the code into a separate – un-importable – directory might be a good idea2.

In hindsight, it looks better to have src, tests, and docs directories in your project root instead of a mumbo-jumbo of directories whose purpose is unclear. Don’t let past Java nightmares cloud your judgement!


To achieve that, you just move your packages into a src directory and add a where argument to find_packages() in your setup.py:

setup(
    [...]
    packages=find_packages(where="src"),
    package_dir={"": "src"},
)

Coping with that minor issue exposed me to a more interesting problem: measuring coverage over multiple tox runs.

Combined Coverage

The combo of running your tests against various versions and configurations using tox and measuring the coverage is popular.

In my experience though, most projects only consider the coverage for one version and either ignore the coverage for other versions or push them to 100% too using # pragma nocover.


But it’s much nicer to have the combined coverage computed over all your tox runs like codecov will do for you. No more guessing and reasoning – you get an accurate report on which lines and branches have been executed and which not.

As of coverage 4.2, this feat is easily achieved if you run your tests directly against your source directory (i.e. without an src directory in between).

Run coverage in parallel mode:

coverage run --parallel -m pytest tests

And add an environment to your tox configuration that will combine and report the coverage over all runs at the end3.

[tox]
envlist = py27,py35,pypy,coverage-report

# ...

[testenv:coverage-report]
deps = coverage
skip_install = true
commands =
    coverage combine
    coverage report

It gets more complicated if your tests run against the installed version of your package, though. Because now the paths of the actually executed modules look like

.tox/py35/lib/python3.5/site-packages/attr/__init__.py

as they should, since we want to test what’s installed by your package and not what’s lying around in your directory!

So, you end up with a very long coverage output with all site-packages of all tox environments.

Fortunately, there is a solution in coverage which is the [paths] configuration section. It allows you to tell coverage which paths it should consider equivalent:

[run]
branch = True
source = attr

[paths]
source =
   src
   .tox/*/site-packages

Now coverage combine will fold the coverage data of these paths together, and you get what you expect:

Name                     Stmts   Miss Branch BrPart  Cover   Missing
--------------------------------------------------------------------
src/attr/__init__.py        17      0      0      0   100%
src/attr/_compat.py         15      0      2      0   100%
src/attr/_config.py          9      0      2      0   100%
src/attr/_funcs.py          35      0     18      0   100%
src/attr/_make.py          202      0     92      0   100%
src/attr/filters.py         15      0      3      0   100%
src/attr/validators.py      33      0     12      0   100%
--------------------------------------------------------------------
TOTAL                      326      0    129      0   100%

Bonus coverage tip: set skip_covered = true in your [report] section to filter out all files with 100% coverage from the output.

Speeding Up With Parallelization

This approach works, but it relies on the order in which the environments are run.

Therefore, if you want to run your tox environments in parallel using the --parallel option (or -p for short, not related to coverage’s option of the same name), you’ll have to make sure that reporting runs separately after all other environments are done.

To illustrate how much of a difference tox -p makes: the modest structlog test suite in all its permutations takes about 1m47s serially and about 43s when parallelized using tox -p. I think a more than 100% improvement is pretty sweet and worth coming up with a solution.

So, I wrote a tiny tool called detoxize4 that reads a list of environments from stdin (probably tox -l output) and prints a command line that runs tox -p with all environments except the last, and finally tox with the last environment:

#!/usr/bin/env python3

import sys


if __name__ == "__main__":
    envs = [env.strip() for env in sys.stdin.readlines()]
    print(
        "tox -p -e " + ",".join(env for env in envs[:-1]) + " && "
        "tox -e " + envs[-1]
    )

In other words if you have:

[tox]
envlist = py27,py35,pypy,coverage-report

in your tox.ini and pipe tox -l into detoxize, it will print

$ tox -p -e py27,py35,pypy && tox -e coverage-report

If you set up a tox project like I’ve sketched out before, you can run

$ tox -l | detoxize | sh

and you get the benefits of parallel speed together with a combined coverage report.

A More Modern Approach

As Bernát Gábor – the maintainer of tox and virtualenv extraordinaire – pointed out to me: there’s a simpler way now.

Since tox -p is integrated in tox, it could add deeper integration and added the concept of dependencies. Therefore you can achieve the above as following:

[testenv:coverage-report]
deps = coverage
skip_install = true
parallel_show_output = true
depends =
    py27
    py35
    pypy
commands =
    coverage combine
    coverage report

You can use the same patterns as in you envlist.

I’ve left the detoxize approach because it can still be useful for filtering out broken interpreters like PyPy on macOS Big Sur.

Travis CI & codecov

Turns out™ that codecov doesn’t work properly if you run coverage in parallel mode. Therefore, you have to coverage combine your coverage data between each run and submit:

after_success:
  - tox -e coverage-report
  - codecov

N.B. I recommend switching off Travis CI as soon as possible, and I’ve written guides for both GitHub Actions (recommended) and Azure Pipelines. They also touch on how to get this coverage setup working.

Summary

You can have a look at the setup.py, tox.ini and pyproject.toml of attrs if you want a simple yet complete and functional example.


  1. Mostly about easy vs correct. I for one will proudly die on the correct hill. ↩︎

  2. There are more good reasons. ↩︎

  3. Before coverage 4.2, you used to have a second tox environment that cleaned up previous coverage data before you ran your tests. This has been addressed. ↩︎

  4. Before tox got its --parallel option, we had to use a separate tool called detox, hence the name. But detox’s deprecated now and doesn’t work with modern tox versions anymore. ↩︎