Python Packaging Metadata

Since this topic keeps coming up, I’d like to briefly share my thoughts on Python package metadata because it’s – as always – more complex than it seems.

When I say metadata I mean mostly the version so I will talk about it interchangeably. But the description, the license, or the project URL are also part of the game.

The overarching problem is that we have two places where we may need that metadata:

In the packaging mechanism. setuptools, pip, flit, poetry, et al need to know what version your package is. This is the package metadata.
But sometimes it’s useful to be able to introspect that data at runtime too. Either to write code that can work with more than one version of a library, or because you’re debugging and you try to double check what you are using. The common way is to use dunder attributes on the package’s root. For example structlog.__version__ will tell you its version.

You don’t want to have conflicting information in those two places. Most importantly for the version information because it would be very confusing if your package code reported a different version number than your package.

Until recently we had no standard way to introspect the package metadata from within our packages, so different strategies emerged:

All kinds of setup.py automation that take advantage of the fact that it’s Python code. This is my preferred approach because I also do some more magic like extracting the changelog for the latest version. Thus it’s unlikely my preference is gonna change anytime soon.
Refer to the package from the packaging toolchain: flit has always extracted your_package.__version__ and if you use a setup.cfg-only setuptools project, you can use fully qualified names to refer to metadata.
Use setuptools-scm to use git metadata for versioning.
Use a tool like bump2version to ensure consistency between files.

Broadly speaking, all of these approaches make sure that there’s always a static, up-to-date version number inside of your package. Due to the historic lack of introspection options, the vast majority of mature packages on PyPI offer this.

However we do have a way to introspect installed packages at runtime now: importlib.metadata which landed in Python 3.8 and that is available as a backport from PyPI.

While there is merit to the argument of better debugability of having static data inside of your package for introspection, I think it’s fair to say that if you release a new project today, you can skip in-code metadata altogether and tell your users to use importlib.metadata.

However that is usually not an option for established projects whose users might have been relying on the existence of in-code metadata. Removing that metadata could break your users for questionable benefits.

So if we wanted to stop duplicating metadata, we’d have to

Conditionally depend on the backport package importlib-metadata on Python versions older than 3.8.
Extract package information in our your_package.__init__ modules.

Since every dependency in the end means a liability, point 1 is not ideal. Point 2 is even worse because here we have to choose between:

Extract unconditionally on import which is a waste of resources and makes your imports slower although the feature is almost never needed.
Build a lazy loader system using sketchy module-level __getattr__ contraptions that add complexity where used to be a simple variable assignment. A Rube Goldberg machine that probably also isn’t worth the effort.

As with many things, it feels like we’re in a transition phase without easy answers – at least not for existing projects with strong backward-compatibility promises.