Your Python applications are running but you’re wondering what they are doing? The only clue about their current state is the server load after ssh-ing into the servers? Let’s change that!

So far, I’ve held it at PiterPy 2015 in Saint Petersburg, Russia, PyCon US 2015 in Montreal (short version), Canada, EuroPython in Bilbao, Spain, and PyCon JP 2015 in Tokyo, Japan

Slides of the EuroPython version on Speaker Deck.

Video of the EuroPython version:

Beyond grep – EuroPython

Five Easy Steps To Get Started

  1. Get a free Sentry account.
  2. Install sentry-sdk into your project.
  3. Configure logging to forward exceptions to Sentry. (Bonus: look at other integrations that are just as easy.)
  4. Install a time series database:
  5. Use uWSGI and tell it to ship its metrics1 to your database.

Now you don’t miss any errors anymore and you have a general idea of what your system is doing. You can start refining.

Please note that nowadays (2021) I prefer Prometheus for metrics storage.

Errors

Proper error logging is awesome :)

Metrics

Metrics are the only way to get your job done.

— Jeff Hodges

General

Collecting

  • Send events as they happen, aggregate in separate service:
    • StatsD was together with Graphite part of the original rise of metrics.
      • Popularized by etsy’s now infamous blog post.
      • Gives you a super-easy way to add metrics to any app: just send UDP packets against it once you measure something.
      • Written in node.js and neither is part of Ubuntu unfortunately.
      • crapload of Python clients.
      • UDP may not be the best way to collect metrics.
    • riemann written by a super-smart person for other super-smart people.
  • Aggregate within your app and send to DB:
    • Prometheus (see further down for more details) has an official Python client.
    • scales is still the most popular package and inspired by Coda Hale’s original Metrics package for Java. Can export data to Graphite.
    • yunomi is promising but needs a new maintainer, volunteers?
    • Yelp wrote a uWSGI-specific port called uwsgi_metrics.
    • faststat is a more minimalist and manual approach.
  • Before adding metrics boilerplate to your business logic, check out whether you can’t observe some of them from the outside:
  • Dig deeper: New Relic
  • System metrics collectors (can also be used to fetch metrics from your apps):

Storing & Viewing

  • Metrics are saved into a time series database.

  • Librato Metrics, paid solution that is beautiful and easy to use.

  • SoundCloud released their time series database that integrates monitoring: Prometheus.

    • It’s pull based instead of push based, which means Prometheus will harvest exposed metrics from your apps.
      • There is a push gateway if you can’t expose the metrics yourself.
    • There is plenty of exporters to instrument other software like nginx or to integrate existing metrics systems like collectd, statsd, Graphite, …
    • Julius Volz – one of the project founders – gave a nice overview to Prometheus in an interview with FLOSS Weekly.
    • One of the philosophical pillars of Prometheus is to aggregate metrics within your app, but only in the simplest possible way. So it’s mostly just incrementing and adding of numbers. Not derivations or percentiles. You get the best of both worlds.
    • I found the community very active, responsive, and friendly.
  • Graphite still probably the most popular times series database.

    • Python-based (Twisted & Django)
    • Its network protocol (implemented by carbon) is a widely supported de-facto standard.
    • There are hosted options too.
    • Main downside is the lack of multidimensional data. While modern TSDBs allow to tag your data like load,server="server1",time="5m" 0.5 while Graphite forces you to put everything into the metric name like server1.load.5m = 0.5
  • InfluxDB, a next generation time series database heavily inspired by Graphite.

  • OpenTSDB is the final big TSDB player.

    • Requires an HBase cluster to work.
  • Grafana, gives you beautiful dashboards for Prometheus, OpenTSDB, Graphite, and InfluxDB.

  • InfluxDB also built an own visualization tool called Chronograf:

    In short, we love Grafana and want to ensure that InfluxDB and Chronograf users can also be happy Grafana users. However, we also needed the ability to iterate on ideas for visualization tools ourselves.

  • etsy’s anomaly detection system: Kale.

  • You want to look at percentiles and not just at averages.

Logging

  • Don’t write prose, the primary consumer of logs is a computer.
    • structlog helps you with structured logging without replacing your existing logging framework.
    • A completely different approach to logging offers lithoxyl from PayPal.
    • eliot is a similar tool although more opinionated and with support for causal chains.
  • Don’t reinvent the logging wheel within Python. If you run on a UNIX-like operating system, there is absolutely no need to add timestamps, let alone do log rotation within your applications. Keep is simple, keep concerns separated.
  • Debugging logging can be frustrating. One nice helper is logging_tree by the fabulous Brandon Rhodes.
  • Log to standard out, then…
  • Example of a logging infrastructure: Gondor.
  • Aggregate your logs in a central place to make it easily searchable in one place.
    • Paid solutions:
    • Most popular open/free solution: ELK
      • Elasticsearch: a distributed database with focus on realtime searching.
      • Logstash: log processor that parses your log entries and saves them to Elasticsearch.
      • Kibana: web frontend to search, correlate, and visualize your log entries.
    • Graylog is another open/free popular logging solution.
    • Another solution is fluentd.
  • Before you get your aggregation in place, you may want to check out The Log File Navigator which is a colorful CLI tool.
  • Often enough, you don’t need to actually store logs but only process them. The CDN provider CloudFlare has 10 trillion log lines each month and uses them to detect attacks and anomalies but doesn’t store them because it’s too much data and to avoid being asked for that data. This is an advanced yet fascinating talk.

Credits


  1. In order to build uWSGI together with the StatsD plugin, you need to set UWSGI_EMBED_PLUGINS="stats_pusher_statsd" while pip-installing it. Be careful if you pre-wheel your dependencies. For Graphite/Carbon support, no special action is necessary. ↩︎