Your Python applications are running but you’re wondering what they are doing? The only clue about their current state is the server load after ssh
-ing into the servers? Let’s change that!
So far, I’ve held it at PiterPy 2015 in Saint Petersburg, Russia, PyCon US 2015 in Montreal (short version), Canada, EuroPython in Bilbao, Spain, and PyCon JP 2015 in Tokyo, Japan
Slides of the EuroPython version on Speaker Deck.
Video of the EuroPython version:
Five Easy Steps To Get Started
- Get a free Sentry account.
- Install sentry-sdk into your project.
- Configure
logging
to forward exceptions to Sentry. (Bonus: look at other integrations that are just as easy.) - Install a time series database:
- Use uWSGI and tell it to ship its metrics1 to your database.
Now you don’t miss any errors anymore and you have a general idea of what your system is doing. You can start refining.
Please note that nowadays (2021) I prefer Prometheus for metrics storage.
Errors
Proper error logging is awesome :)
- Logs are Liars so focus on error tracking and metrics.
- Alternatives to Sentry:
- Sentry has no official Twisted integration, but it’s straightforward.
Metrics
Metrics are the only way to get your job done.
General
- Coda Hale: Metrics, Metrics Everywhere
- Jeff Hodges: Distributed Systems in Production
Collecting
- Send events as they happen, aggregate in separate service:
- StatsD was together with Graphite part of the original rise of metrics.
- Popularized by etsy’s now infamous blog post.
- Gives you a super-easy way to add metrics to any app: just send UDP packets against it once you measure something.
- Written in node.js and neither is part of Ubuntu unfortunately.
- crapload of Python clients.
- UDP may not be the best way to collect metrics.
- riemann written by a super-smart person for other super-smart people.
- Basically a stream-oriented StatsD + monitoring + dashboards using protocol buffers.
- Configured using Clojure, rather high barrier of entry.
- Plenty of clients including for Python.
- StatsD was together with Graphite part of the original rise of metrics.
- Aggregate within your app and send to DB:
- Prometheus (see further down for more details) has an official Python client.
- scales is still the most popular package and inspired by Coda Hale’s original Metrics package for Java. Can export data to Graphite.
- yunomi is promising but needs a new maintainer, volunteers?
- Yelp wrote a uWSGI-specific port called uwsgi_metrics.
- faststat is a more minimalist and manual approach.
- Before adding metrics boilerplate to your business logic, check out whether you can’t observe some of them from the outside:
- Tweens
- Django Middleware
- Pull out of logs.
- Leverage monitoring.
- Graham Dumpleton of
mod_wsgi
fame wrote a series of articles on monitoring performance of WSGI applications:
- Dig deeper: New Relic
- System metrics collectors (can also be used to fetch metrics from your apps):
Storing & Viewing
Metrics are saved into a time series database.
Librato Metrics, paid solution that is beautiful and easy to use.
- recognizer, a Graphite proxy.
- Using StatsD with Librato.
SoundCloud released their time series database that integrates monitoring: Prometheus.
- It’s pull based instead of push based, which means Prometheus will harvest exposed metrics from your apps.
- There is a push gateway if you can’t expose the metrics yourself.
- There is plenty of exporters to instrument other software like nginx or to integrate existing metrics systems like collectd, statsd, Graphite, …
- Julius Volz – one of the project founders – gave a nice overview to Prometheus in an interview with FLOSS Weekly.
- One of the philosophical pillars of Prometheus is to aggregate metrics within your app, but only in the simplest possible way. So it’s mostly just incrementing and adding of numbers. Not derivations or percentiles. You get the best of both worlds.
- I found the community very active, responsive, and friendly.
- It’s pull based instead of push based, which means Prometheus will harvest exposed metrics from your apps.
Graphite still probably the most popular times series database.
- Python-based (Twisted & Django)
- Its network protocol (implemented by
carbon
) is a widely supported de-facto standard. - There are hosted options too.
- Main downside is the lack of multidimensional data. While modern TSDBs allow to tag your data like
load,server="server1",time="5m" 0.5
while Graphite forces you to put everything into the metric name likeserver1.load.5m = 0.5
InfluxDB, a next generation time series database heavily inspired by Graphite.
- SQL-like query language.
- Friendlier storage configuration and easy to cluster.
- Backed by a company selling support and hosting.
OpenTSDB is the final big TSDB player.
- Requires an HBase cluster to work.
Grafana, gives you beautiful dashboards for Prometheus, OpenTSDB, Graphite, and InfluxDB.
InfluxDB also built an own visualization tool called Chronograf:
In short, we love Grafana and want to ensure that InfluxDB and Chronograf users can also be happy Grafana users. However, we also needed the ability to iterate on ideas for visualization tools ourselves.
etsy’s anomaly detection system: Kale.
You want to look at percentiles and not just at averages.
Logging
- Don’t write prose, the primary consumer of logs is a computer.
- Don’t reinvent the logging wheel within Python. If you run on a UNIX-like operating system, there is absolutely no need to add timestamps, let alone do log rotation within your applications. Keep is simple, keep concerns separated.
- Debugging
logging
can be frustrating. One nice helper is logging_tree by the fabulous Brandon Rhodes. - Log to standard out, then…
- …timestamp and write to a file.
- …send to syslog. Be careful how you configure your syslog though.
- …pipe it to a log harvester like logstash-forwarder.
- Personally, I prefer the reliability of writing to ext4 + watching and processing of files.
- Example of a logging infrastructure: Gondor.
- Aggregate your logs in a central place to make it easily searchable in one place.
- Paid solutions:
- loggly (SaaS)
- papertrail (SaaS)
- splunk (SaaS possible, popular in enterprises)
- Most popular open/free solution: ELK
- Elasticsearch: a distributed database with focus on realtime searching.
- Logstash: log processor that parses your log entries and saves them to Elasticsearch.
- Kibana: web frontend to search, correlate, and visualize your log entries.
- Graylog is another open/free popular logging solution.
- Why they think they’re superior (probably to ELK without naming it :)).
- Graypy allows for logging directly into it from Python.
- graystruct integrates Graypy and
structlog
.
- Another solution is fluentd.
- Paid solutions:
- Before you get your aggregation in place, you may want to check out The Log File Navigator which is a colorful CLI tool.
- Often enough, you don’t need to actually store logs but only process them. The CDN provider CloudFlare has 10 trillion log lines each month and uses them to detect attacks and anomalies but doesn’t store them because it’s too much data and to avoid being asked for that data. This is an advanced yet fascinating talk.
Credits
- Icons: Symbolicons
- Picture of Hindenburg disaster: Wikipedia
- Picture of logs: Wikipedia
- Picture of Weird Al Yankovic: YouTube
- Application icons and screenshots from the respective project homepages.
- Django picture from http://www.djangopony.com/
- Picture of me at PyCon Russia: Facebook [only PiterPy]
- Picture of pintxos: Wikipedia [only EuroPython]
- Picture of Schloss Neuschwanstein: Wikipedia [only PyCon JP]
- Picture of Takoyaki: Wikipedia [only PyCon JP]
In order to build uWSGI together with the StatsD plugin, you need to set
UWSGI_EMBED_PLUGINS="stats_pusher_statsd"
while pip-installing it. Be careful if you pre-wheel your dependencies. For Graphite/Carbon support, no special action is necessary. ↩︎