Deploying web applications is hard. No shiny continuous deployment talk and no DevOps coolness can change that. Or to use DevOp Borat’s words: “Is all fun and game until you are need of put it in production.“ There are some mistakes I see people doing again and again so I’d like to address them here.
This article is also laying ground for the second part where I’ll describe the way we deploy Python applications.
Before I start preaching, let me tell you a bit about me and what I do in order to give you some perspective from which I’m writing.
I work for a German web hoster and domain registrar. And I’m deploying Python based applications all the time. Most parts of our infrastructure are built using Python. And those that aren’t, will be eventually.
The sizes range from tiny glue to mission-critical APIs. We have legacy Pylons, new Pyramid, some Django and a lot of Twisted apps. And everything is seasoned with a hint of Celery.
So if I say “application”, I don’t mean just some Django CRUD front end. Python lives in all layers here. And all layers have to be deployed somehow.
Deploying so many diverse applications requires solid and consistent deployment standards if you don’t want to go crazy. The main mantra is to go for simple solutions, not for easy ones. Something that is easy now, can become a major PITA down the road.
Don’t use ancient system Python versions
Every time someone whines about lack of support for Python 2.4 in recent packages, I hear Kenneth Reitz saying:
Python 2.4 is not supported. It came out 8 years ago. That's older than Youtube. Upgrade.
If you’re serious about using Python you should be prepared to roll your own RPMs/DEBs. We’re running even RHEL 4 on some of our servers; but we’re a Python company so we use the best thing we can get – even if it means extra work.
We also have to compile our own Apaches and MySQLs for our customer servers (we don’t use any of them for our own systems, but our customers demand a solid LAMP stack) because we need that fine-grained control. Why should be Python an exception? Rolling an own DEB/RPM is a lot of less of a nuisance than writing code for Python < 2.6.
This works also both ways. It’s totally possible that you have some mission critical web app that isn’t compatible with Python newer than 2.4. Are you going to install a single server with an ancient OS, just to accommodate? Key infrastructure must not be dictated by third parties.
On the other hand I’m not saying that you have to compile Python yourself! Oneiric and later have Python 2.7 on board, there’s absolutely no reason to build it for yourself. The stress is on “ancient”, not on “system” in this caption.
Use virtual environments
Gentlepeople, if you’re deploying software, always use virtualenv. Actually, that same goes for local development – look into virtualenvwrapper which makes the handling of them a breeze. So never install into your global site-packages! The only exception is the aforementioned virtualenv – which in turn installs pip in each environment it installs to.
Test your software against certain versions of packages, pinpoint them using
pip freeze and be confident that the identical Python environment is just a
pip install -r requirements.txt away. For the record, I split up my requirement files; more on that in the next installment.
Also, use real version pinning like
package==1.3. Don’t do
package>=1.3, it will bite you eventually, just as it has bitten me and many others.
Never use Python packages from your distribution
This one is in fact an extreme version of the previous anti-pattern.
First of all, there’s no reason to succumb to a dictate of your distribution which version of a package to use. They don’t know your application. Maybe you need the latest version, maybe you need a slightly older one.
- If I write and test software, I do it against certain packages. Packages tend to change APIs, introduce bugs, etc.
- My software is supposed to run on any UNIXy platform as long as the Python it’s written against is present.
What if the next Ubuntu ships with a different SQLAlchemy by default? Do I have to fix all my applications before upgrading our servers? Or what if I need to deploy an app to an older server? Do I have to rewrite it so it runs with older packages? I prefer not to.
I really wish the Linux distributions wouldn’t ship anything more than the Python interpreter and virtualenv. Anything else just leverages bad behavior.
The only good they may be doing is automatically updating packages with security vulnerabilities that you may have missed. That said, I’m convinced that if you deploy software to the net, you have the responsibility to monitor them yourself anyway. Relying on your distribution gives you just a false sense of security – if your customer’s data gets hacked, they don’t care that Ubuntu was to slow to issue a security update.
Don’t run your daemons in a tmux/screen
It seems to be part of everyone’s evolution to do it, so be the first one to skip it!
Yes, tmux is full of awesome (and wayyy better than screen), but please don’t just ssh on your host and start the service in a tmux or screen. You have nothing that brings the daemon back up if it crashes. You can’t restart it on 10 servers without ssh’ing on 10 servers, get the screen and Ctrl-C it. Granted, it’s easy in the beginning but it doesn’t scale and lacks basic features that simple-to-use tools have to offer.
My favorite one is supervisord. A definition for a service looks as simple as:
[program:yourapp] command=/path/to/venv/bin/gunicorn_django --config deploy/gunicorn-config.py settings/production.py user=yourapp directory=/apps/yourapp
You add the file to
/etc/supervisor/conf.d/, make a
supervisorctl update and your service is up an running. It’s a no-brainer and much easier than juggling rc.d scripts. Crash recovery and optional web interface included.
Configuration is not part of the application
Your production configuration doesn’t belong into the (same) source repository. There are configuration management tools like Salt, Puppet or Chef that do exactly that for you.
Just better and reliably. While installing the configuration, Puppet can make sure that the directories have always certain permissions. Configuration templates make it perfect for mass deployments. Some service IP changed? Just fix it in Puppet’s repo and deploy the changes. Eventually all services will catch up. If you want, you can always trigger a run, for example using a simple Fabric script.
But don’t use Fabric for actual deployments! This is the perfect example of the battle between “simple” and “easy”. At first, it’s easier to put everything inside of the repo and run a Fabric script that does a
git pull and restarts your daemon. On the long run, you’ll regret it like many before you did.
Just to stress this point: I love Fabric and couldn’t live without. But it’s not the right tool for orchestrating deployments – that’s where Puppet and Chef step in.
Look into alternatives to Apache + mod_wsgi setups
Many people go for Apache and mod_wsgi by default, because everybody has already heard about Apache.
To me, Apache feels like a big ball of mud and I find the modular combination of gunicorn or uwsgi together with nginx much more pleasing and easier to control.
YMMV, but have a look around before you settle.
I don’t claim that I’ve discovered the sorcerer’s stone, however I’ve developed a system for us that proved solid and simple on the long run.
The trick is to build a debian package (but it can be done using RPMs just as well) with the application and the whole virtualenv inside. The configuration goes into Puppet and Puppet also takes care that the respective servers have always the latest version of the DEB.
The advantage is that such a DEB is totally self-contained, doesn’t require to have build tools and libraries on the target servers and paired with solid Puppet configuration, it makes consistent deployments over a wide range of hosts easy, fast and reliable. But you have to do your homework first.
If you find this approach intriguing, make sure you check out my article where I describe it! I've also presented at PyCon about Python deployments and the talk notes may have some value to you.
Meanwhile, if you wish to comment or add something, my contact details can be found on the about page. I will expand this article if parts seem unclear or misunderstood.
After this article hit Hacker News and made it to #1 (thanks a lot everyone!), there has been an interesting discussion over there. So if you want to read more about deployment and maybe some counter-arguments to my approach, make sure to have a look at http://news.ycombinator.com/item?id=3879926. Also, there has been a discussion on Reddit which didn’t get as big though.
Another achievement unlocked: this article has been featured in my favorite magazine Hacker Monthly, Issue 26, July 2012. Yes, I’m bursting with pride. ;)