Thursday, 11 July 2013

Deploying Moodle - continued

One thing that I didn't really touch upon in my previous blog post was the method in which we actually deploy Moodle to our live servers.
Our production Moodle system is... well, complicated. We only have 14,000 students at present, but we're trying to really push use of the Moodle and the last thing that we want to happen is for it to go down.
To the same aim, we want to be able to patch servers without any downtime, and be able to take servers out of our load balancing pools to enable us to prepare updates seamlessly to our end users.

I will try and improve this blog posting a bit more when I have some time...

Systems

Architecture

We run our deployment on a purely virtualised infrastructure, currently that's VMWare VSphere 5.1. We have a separate infrastructure team who provide that infrastructure. The VMs sit on a pair of fully-redundant and replicate Storage Area Networks (SANs), and our moodledata is served over NFS by a NAS (Network Attached Storage) server.
Rather than having on beefy server to handle all of the load, we've found it's more efficient to have lots of smaller servers. Our VMWare specialists (Matt, and Graham) inform me that it's far better for scheduling if we have fewer processor cores than too many. If we have more than 4 processor cores on a VM, then VMWare has to try a lot harder to allocate the resources, and it's harder to migrate those VMs around the various blades.
At present, our architecture looks something like this (sorry, it's a little out of date, but should give you a fair idea):

Web servers

We have five live web servers (moodle-web[0,1,2,3,4]). These currently have two cores allocated, and 3GB RAM. They only need a small disk (20GB). Looking at our statistics throughout the year, we're likely to relinquish 1G of this memory on each VM as it's largely going unutilised.
We went for five servers because we want to be able to theoretically lose a whole blade which may have a couple of web servers on it, and not lose service. Theoretically VMWare should handle this automagically, but we've seen cases where this hasn't happened as it should.
Futhermore, we frequently pull one or two of these servers out of the pool to perform maintenance. I'll be doing this next week in our Moodle 2.5 upgrade. I'll take two of the servers out of the pool, prepare them with the upgrade and make sure that everything is all funky dory, and then I'll perform the upgrade, swing over to the new servers, and upgrade the old ones.
These web servers serve their content using Apache2 and we currently use mod_php5 rather than fastcgi. We didn't find any particularly stunning performance improvements with any of the CGI methods, but this may be something we re-evaluate in the future.
We currently ues APC, but we may consider switching to PHP Opcache when we upgrade to PHP 5.5 at some point. We follow Debian releases so we're unlikely to see PHP 5.5 for a couple of years yet.

Load balancers

To handle all of these web servers, we have a pair of load balancers. I should point out that we're Debian nuts and we love Open Source. We do our load balancing in Software with two software VMs.
These are also low-powered with 2 cores, and 1GB RAM (actually, one of them has 2GB but we intend to reduce this back to 1GB).
We currently terminate our SSL connections on these load balancers with nginx. When we made this decision, we were in two minds as to whether it was the 'right' thing to do, but in retrospect it has worked very well for us. It's a toss-up between being able to scale vertically at the web server, or at the load balancer. The web servers don't need public IP addresses, whilst the load balancers do. However, our web servers cost more (in terms of resources), and Lancaster University is extremely fortunate enough to have access to an entire slash-16 address range with 65534 globals.
In addition to terminating the SSL with nginx, we also do an amount of caching. Any images, and javascript, served by certain intelligient endpoints in Moodle are cached more aggressively in memory on the load balancer. This way, we don't even hit the load balancing software to serve them, and we don't hit the web servers either. Handy!
We also use X-Accel-Redirect to serve many of the files from our NFS server using nginx directly; rather than having PHP buffer them from disk. This is much more efficient and saw a huge drop in our CPU usage on the web servers with only a minor increase in CPU usage on our frontends. Basically, web serving software is designed for serving bytes off disk, whilst php is not. Again, it may seem a bit strange to serve the files on the load balancer rather than the web servers, and this is something that we may change in due course, but at present we are forced to use Apache on our web servers, and the mod_sendfile Module for Apache2 which does the same thing as X-Accel-Redirect for nginx is much less mature.
After traffic has been terminated, and cached content retrieved and served, it's then passed to our load balancing software, haproxy.
haproxy is an awesome little tool, which supports a range of really powerful features including different allocation methods, session stickiness, and it is also protocol aware for some protocols. It also has handy logging.
I'll briefly mention that we use keepalived to manage the VRRP layer of our stack. Each of our load balancers has a dedicated management IP, and a virtual service IP. The management IP never changes and reflects the name of the server providing service (e.g. moodle-fe0.lancs.ac.uk). Meanwhile, the VIPs are free to fly wherever they need.
We also use round-robin DNS to direct traffic to the two load balancers. I'd consider something like multicast DNS, but in our current environment all of our servers are in the same pair of datacentres and are only a mile apart and use the same IP range as everything else on campus. There's really very little point at this time.
At any point, we can take a load balancer out of service for maintenance. We frequently do so and our end users shouldn't notice at all. They'll still get sent to the same web servers that were handling their request before.

Software distribution

As I mentioned before, we're Debian nuts. We love Debian. We use it for pretty much everything (I think we have one Ubuntu box for BigBlueButton, but that's a Debian derivative anyway).

Server configuration

We also have a configuration management suite called configutil. It was written by a former employee, Chris Allen, and was originally based on Oxford University's configtool. However, we've pretty much rewritten it now and it does some pretty cool stuff. This includes distributed iptables, user management, package management, service management, and file deployment. A large chunk of this is actually handed over to puppet, but we build the puppet manifest with configbuild, and deploy the files with configutil/luns-config.
We keep all of our server configuration in git too (did I mention, we like git), build the configuration using configbuild. Servers have a deployment tool called luns-config which syncs against the configuration server.
In addition to liking Moodle, Mahara, Debian, and git, we also like security. In fact, we really really like security.
I'm not just talking about security in getting onto our systems (all of our servers are behind our corporate firewall, plus have strict iptables. We then enforce ssh keys on all servers). We also like our configuration to be safe. Our configuration is served over SSL, with client-side key verification too using our internal certificate authority. We generate revocation lists frequently and if the list goes out of date (6 monthly IIRC), we stop serving any configuration. A server can only retrieve configuration for the server named in it's configuration management certificate. On our package management server, we employ the same type of client-side certificate requirement so only systems with a valid SSL certificate and key-pair can access our configuration.

Software deployment

So now we've got all of that out of the way... I did mention that we really like Debian right? Right, good. Because we deploy all of our software in the form of Debian Packages. I mean all of it.
We've gone down this route for a number of reasons, some of them theoretical advantages, and some of them learned from experiences. They really come down to these though:
  • we want to be sure that we know what software is on a server;
  • we want to be sure that each server in a group is identical;
  • we want to be able to install a new server quickly and easily;
  • we want the ability to roll back to a previous version if we screw up; and
  • we have a tendency to twitch if we come across things out of place.
Basically, what it comes down to, is that we want to be able to quickly and easily build replacement servers, add new servers, re-install servers, etc. Most of our servers are entirely disposable. We try to keep all data on dedicated storage. As I mentioned before, our moodledata is on NFS. In reality, most of our data across all servers is stored on our NAS and served over NFS.
So if we discover that we're breaching the limits of our server configuration, we can scale horizontally (that is the right one isn't it?) and have a server built with a known configuration in a very short period of time (typically about an hour).
To this aid, we package all of our software. So each Moodle installation is a separate Debian package. Debian packages are awesome.
When we upgrade Moodle, we update the code using git (see thamblings.blogspot.com/2013/07/upgrading-moodle-from-git.html for my post on that topic). Once we've done that, we merge our new deployment branch into a new git branch - luns-moodle-lu_2.5.
This has the debian packaging information in it and this is where we create our package from.
Why not just keep our packaging data in our LUVLE-2-5.deployment branch? Well, we could do, but we feel that this is cleaner. I mean cleaner both in terms of separation of processes, and history.
For example, if we discover a bug in our package (like a missing dependency), then we want to make that change on our packaging branch. We don't want that change mixed up with the history of our Moodle codebase.
If you're interested in our package skeleton, I've put it in a gist at https://gist.github.com/andrewnicols/ae439676d116e9a6582f. These files all go into the debian directory, and then you can run dch --create to create an empty Changelog. You will, of course, need to update the control file to reflect your package name.
So once we've made our chnages, we merge the deployment branch into our packaging branch; we incremement the version number; build the package; add it to our package server; and deploy each of the frontends. Here's a summary of that process:

Hmm - that looks very long winded, but in manys ways it's just lots of small repetitive tasks which separate concerns, and make our lives easier in the long-run.
Now we've done that, to deploy on our five web servers, and our cron server we just run:
sudo apt-get update; sudo apt-get upgrade
Nice, and easy.