Maintaining a Python Project When It’s Not Your Job

PyPI is a gold mine of great packages but those packages have to be written first. More often than not, projects that millions of people depend on are written and maintained by only one person. If you’re unlucky, that person is you! This talk tries to lighten the burden by giving you useful tools and approaches.

So far, I’ve held it at PyCon US 2019 in Cleveland, OH, USA, PyCon Lithuania 2019, EuroPython 2019, and PyGotham 2019.

Slides on Speaker Deck

Abstract

The goal it remove as much friction as possible. Both for you both mainly for your contributors since any friction for them falls back on you in form of support labor and work that’s simply not done.

Act 1: Development

Here is an example of a CONTRIBUTING.rst. Please note how it does all of the following:
- Encouragement to dispel any notion of perceived necessary status. Everyone can contribute.
- Workflow outline to give the willing contributor the feeling what expects them.
- Code, test, docs, and changelog standards.
- Local development environment for a quick feedback loop.
  - Feedback Loop, a definition in the context of tests and development. I consider it the #1 factor in development ergonomics; especially when trying to understand a new code base.
  - Extras are a great way share extra dependencies for running tests, building documentation, or all of the above.
- Expectations in behavior, linking the Code of Conduct.
  - Setting the tone for your project’s community is probably the biggest privilege of being the head of one. Code of conducts are a way to codify them.
    - GitHub has direct support for adding CoCs. The linked article also provides more rationale.
    - Contributor Covenant is a good starting point.
    - CoCs are pointless though, if you don’t enforce them.
  - Setting expectations for open source participation.
Running all tests should only be a matter of running tox.
- Or Nox. :)
- Thea Flowers – the author of Nox – gave a deeper dive on tox, Nox, and invoke: Break the Cycle: Three Excellent Python Tools to Automate Repetitive Tasks
- By also using tox in your CI, you make sure that the user runs exactly the same test commands as the CI which significantly lowers the risk of frustrating broken pushes.
- pyenv is a great way on UNIX-style systems to install all the Python versions that you need.
Having a high test coverage, is an investment not only in the code quality right now but also for you in a few months.
Notable checkers:
- flake8 makes sure your code mostly follows PEP 8 which is nice for readability (automatic formatting is nicer though). It also checks for errors like unused imports.
- check-manifest prevents the infamous “Fix MANIFEST.in” commit. Or at least it could. 🙈
- twine can check your PyPI long description.
- mypy: static typing can do wonders to your understanding of how part of your code interact
Anything that is automatically formatted, cannot be formatted wrong and saves frustrating review comments over minutiae.
- Black formats your code into a nice and deterministic format.
  - Contrary to Black’s default of 88, I’m a big fan of 79 characters line lengths. The main reason is that it forces you to write your code in a way that I find much more readable. Craft Your Python Like Poetry sums up my thoughts very well.
    Don’t be afraid to hit that enter key!
- isort formats your imports into nicely separated and sorted blocks.
  - If you use the correct settings (if you’re a depraved individual who doesn’t break after 79 characters, you can also use the black profile or configure it yourself altogether).
  - Before isort 5.0, you also had to configure your third-party packages by hand or using seed-isort-config. That is not necessary anymore.
- prettier offers automatic formatters for other file types you might run into.
pre-commit offers a framework for running hooks before committing code.
- It’s Python-aware, but not Python-specific. It will manage your Python-based tools in appropriate virtualenvs, but you can also use it with many types of hooks including running Docker containers.
- Here’s a config file to get started. It will:
  - Format your code with black (and fail if it has changed something so you can stage the changes before committing).
  - Format your imports using isort (same).
  - Seed the list of known third party imports (using the aforementioned seed-isort-config).
  - Check your code with flake8.
  - Check for:
    - trailing whitespace
    - bogus end of files
    - stray debug statements
- All you have to do is copy it into your project and run pre-commit run --all-files. pre-commit will do the rest.
As much as possible of code quality should be automated. Let the robots do the pestering. This talk only scratched on the surface, but there are deeper dives:
- Automating Code Quality: Next Level

Documentation

Sphinx is so good that Apple uses it for its Swift docs. Modern versions even support Markdown so writing and updating a very long README is more work long term.
- But for documentation, I’d still recommend going for reST: Why You Shouldn’t Use “Markdown” for Documentation
Don’t host your documentation yourself. Use the amazing Read the Docs!
- It’s very easy to get started.
- Please become Gold members, it’s worth it.
If you have regular conflicts in your changelog, you should try out towncrier.
Did you know that you can slice and dice your README in your Sphinx docs to avoid information duplication?
```
.. include:: ../README.rst
     :start-after: string-1
     :end-before: string-2
```
Will insert ../README.rst, but only whatever is between string-1 and string-2. By using comments (lines that start with two dots), you can use arbitrary unique strings.
Since you can include the same file multiple times, you can extract everything you want in the order that is best.
There are some extensions that ship with Sphinx and that minimize code duplication and maximize maintenance comfort. You just have to activate them in your conf.py.
- sphinx.ext.autodoc: write API docstrings once and then just include them. This allows them to live with the code so the risk of being out of date is lower and they are actually helpful when developing.
  - API docs without examples are incomplete but examples in docstrings are tedious. The solution is to indent additional content that belongs to the autodoc entry. Check out how attrs does it.
- sphinx.ext.doctest: write your examples as doctests to make sure that your documentation is not lying to your users.
- sphinx.ext.intersphinx: enables you to link directly to the API docs of other projects, including the standard library if you tell it where to find them. E.g. `:func:logging.getLogger``` will link to the logging docs.
Make sure your documentation builds and your doctests pass using a tox environment.
Good docstrings for tests are good idea: How to write docstrings for tests

Act 2: Pull Request

Keep in mind that when you send a pull request you’re saying, “I wrote some code. I think you should maintain it.”
— Nicholas C. Zakas, Tweet

Some of the stress around open source can be shed with a shift in one’s own thinking: Sustainable Open Source: The Maintainers Perspective or: How I Learned to Stop Caring and Love Open Source
Talks, blog posts, and interviews about the experience of being an open source maintainer
Automate as much as possible again. One game changer for CPython core development were Mariatta’s GitHub bots. She gave a talk on them: Don’t Be a Robot, Build the Bot

Continuous Integration

Travis CI has been the undisputed champion for FOSS CI for years.
- They got bought by Idera and laid off many people. Nobody knows what the future has in store for the (free) service.
Azure Pipelines is smelling blood. Their FOSS offerings are very generous (10 parallel jobs!) and they offer Linux, macOS, and Windows builds. Sadly it’s hard to find simple examples to replicate what we had in Travis.
- Migrate from Travis to Azure Pipelines is the official guide and its length is sadly a testament to its complexity. It also ignores the prevalent workflow of using tox in the Python FOSS community and focuses on corporate users.
- Azure Pipelines with Python — by example tries to close that gap.
- azure-pipeline-templates
- Currently there’s also no integration with codecov so there’s always the danger that someone steals your codecov token (even though it’s an harmless leak) by opening a pull request and exposing the secret variable.
  - Azure Pipelines have their own coverage system so it’s unclear whether or not this will ever remedied.
codeship is venturing in FOSS but it’s hard to find information on it on their homepage and they currently lack public build logs (your contributors learn that a build failed, but not why).
AppVeyor always was the standard if you needed to run tests on Windows. It used to be a bit slow but is quite good now!
Circle CI has been around for a while and has been valued by some. It’s a bit stingy in their free offerings though.

Community

One Of The Team: Cory Benfield on community building and avoiding elite in-crowds.
Jazzband is a collaborative community to share the responsibility of maintaining Python-based projects. If you can’t maintain your project anymore or need help, you should consider contacting them.
Once a project grows, one person is seldom able to own all of the code. GitHub offers a special file to map paths to its maintainers.

Act 3: Release

A PR-plus-CI driven workflow will keep your project in an always releasable state.
If you need help to get your package to PyPI, I have written a blog post for you: Sharing Your Labor of Love: PyPI Quick and Dirty. It’s from 2013 but I’m keeping it up to date.
It’s become increasingly popular to use various CI solutions to publish to PyPI. Some examples:
I like more control and want to centralize my tools and knowledge.
- All my setup.pys have the some rough shape like this one. The fact that all data is available as global variables (e.g. NAME) allows me to import this file (hence the if __name__ == "__main__": block) and access all the data from within my release script that works with all of my projects.
- The canonical package metadata lives in the main __init__.py of each package.
  - Both my setup.py and my Sphinx config file docs/conf.py load that file and parse it using simple regexps.
  - There is no possibility for data inconsistencies and there’s not runtime overhead unlike the solution that was suggested to me on the poetry bug tracker.
    Please note that the recommendations did not come from Sébastien himself (who is currently busy) and I hope that better solutions will emerge eventually.
- Releasing a package is only a matter of:
  - removing the .dev0 suffix that I use as an in-dev marker
  - and – depending on the project – either removing the UNRELEASED date in my changelog, or removing a warning and running towncrier to assemble a new release section.
    - Conventions also allow me to extract the changelog entries for only the current release using a simple regexp and add it to the long description on PyPI. See the bottom of attrs’s page for an example.
  - committing the changes
    - here an example from structlog 19.1.0
  - tagging the version
  - double-checking
    - CI
    - fully rendered long PyPI description
    - documentation links by running make linkcheck (or make.bat linkcheck)
      - Make sure you exclude GithHub issues lest you get rate limited.
  - building the package
  - uploading them using twine
  - starting the new development cycle
    - bumping the version and adding the .dev0 suffix
    - preparing a new changelog section
    - here’s an example after structlog 19.1.0 has been released.
  - All of this is trivially scriptable. Don’t punt on it just because it’s trivial. Even the smallest amounts of friction make one’s life miserable in the long run.

Epilogue

CalVer > SemVer, don’t @ me.