To get real time insight into running applications you need to instrument them and collect metrics: count events, measure times, expose numbers. That used to be a clusterfuck of technologies and approaches. Prometheus changes that.

So far, I’ve held it at PiterPy 2016 and WGDF 2016 in Saint Petersburg, PyCon US 2016 in Portland, EuroPython 2016 in Bilbao, PyCon ZA 2016 in Cape Town, and DevOpsPro Moscow 2016.

Slides on Speaker Deck

Get Instrumented – EuroPython 2016

Metrics In General

Prometheus

Another record with the server improvements of the upcoming 0.18 release: 800k samples/s ingestion rate with 1.7M series and 2100 targets.

Official Prometheus Twitter Account, Tweet
  • Homepage and documentation.
  • Thoughts on push vs pull based metrics/monitoring:
  • It might be tempting to just use the push gateway and leave everything push based. However the push gateway is intended for a very specific use case and it’s important to know When to Use the Pushgateway. If the success of instrumentation depends on using the Prometheus for use cases outside this realm, Prometheus may not be the best choice for you. However those cases are rather rare. So you may want to double check preconceived notions.
  • Prometheus offers two (modern) different strategies on how to store sample data: “double delta” (default) and “varbit” (new in 0.18, referred to in the tweet above). They allow to trade disk space and I/O load for query runtime: When (not) to use varbit chunks.
    • If you’re interested how the TSDB behind Prometheus works, this talk by its author from PromCon 2016 is fascinating.
  • Prometheus proper is intended to be scaled using federation. However third parties have started horizontal solutions:
    • DigitalOcean’s Vulcan that builds on Kafka, Elasticsearch, and Cassandra.
    • Weaveworks’ Prism that relies on AWS services.

PromQL

alertmanager

Instrumenting Your Environment

  • node_exporter for instrumenting from inside (metal, KVM, LXC, jails, …)
  • cAdvisor for instrumenting from outside (mainly Docker; but also LXC).
  • mtail for extracting metrics from log files based on regular expressions.
    • The apache_metrics example can extract better metrics than the Apache status-based exporter.
    • If you like grok, you may also be interested in the grok_exporter that will allow you to reuse your patterns.
  • Prometheus can be introduced step by step by using one of the bridging exporters. Here’s an example for Graphite.
  • More official and unofficial exporters can be found here.
  • Sometimes you want to know the load on your database servers. That’s when machine roles come handy.

Adding Prometheus to Your App

Don’t shy from adding instrumentation code. It takes a while to get used to it by it should be an integral part of your software. Not an after-thought.

Credits