Thursday 11 July 2013

Deploying Moodle - continued

One thing that I didn't really touch upon in my previous blog post was the method in which we actually deploy Moodle to our live servers.

Our production Moodle system is... well, complicated. We only have 14,000 students at present, but we're trying to really push use of the Moodle and the last thing that we want to happen is for it to go down.

To the same aim, we want to be able to patch servers without any downtime, and be able to take servers out of our load balancing pools to enable us to prepare updates seamlessly to our end users.

I will try and improve this blog posting a bit more when I have some time...

Systems

Architecture

We run our deployment on a purely virtualised infrastructure, currently that's VMWare VSphere 5.1. We have a separate infrastructure team who provide that infrastructure. The VMs sit on a pair of fully-redundant and replicate Storage Area Networks (SANs), and our moodledata is served over NFS by a NAS (Network Attached Storage) server.

Rather than having on beefy server to handle all of the load, we've found it's more efficient to have lots of smaller servers. Our VMWare specialists (Matt, and Graham) inform me that it's far better for scheduling if we have fewer processor cores than too many. If we have more than 4 processor cores on a VM, then VMWare has to try a lot harder to allocate the resources, and it's harder to migrate those VMs around the various blades.

At present, our architecture looks something like this (sorry, it's a little out of date, but should give you a fair idea):

Web servers

We have five live web servers (moodle-web[0,1,2,3,4]). These currently have two cores allocated, and 3GB RAM. They only need a small disk (20GB). Looking at our statistics throughout the year, we're likely to relinquish 1G of this memory on each VM as it's largely going unutilised.

We went for five servers because we want to be able to theoretically lose a whole blade which may have a couple of web servers on it, and not lose service. Theoretically VMWare should handle this automagically, but we've seen cases where this hasn't happened as it should.

Futhermore, we frequently pull one or two of these servers out of the pool to perform maintenance. I'll be doing this next week in our Moodle 2.5 upgrade. I'll take two of the servers out of the pool, prepare them with the upgrade and make sure that everything is all funky dory, and then I'll perform the upgrade, swing over to the new servers, and upgrade the old ones.

These web servers serve their content using Apache2 and we currently use mod_php5 rather than fastcgi. We didn't find any particularly stunning performance improvements with any of the CGI methods, but this may be something we re-evaluate in the future.

We currently ues APC, but we may consider switching to PHP Opcache when we upgrade to PHP 5.5 at some point. We follow Debian releases so we're unlikely to see PHP 5.5 for a couple of years yet.

Load balancers

To handle all of these web servers, we have a pair of load balancers. I should point out that we're Debian nuts and we love Open Source. We do our load balancing in Software with two software VMs.

These are also low-powered with 2 cores, and 1GB RAM (actually, one of them has 2GB but we intend to reduce this back to 1GB).

We currently terminate our SSL connections on these load balancers with nginx. When we made this decision, we were in two minds as to whether it was the 'right' thing to do, but in retrospect it has worked very well for us. It's a toss-up between being able to scale vertically at the web server, or at the load balancer. The web servers don't need public IP addresses, whilst the load balancers do. However, our web servers cost more (in terms of resources), and Lancaster University is extremely fortunate enough to have access to an entire slash-16 address range with 65534 globals.

In addition to terminating the SSL with nginx, we also do an amount of caching. Any images, and javascript, served by certain intelligient endpoints in Moodle are cached more aggressively in memory on the load balancer. This way, we don't even hit the load balancing software to serve them, and we don't hit the web servers either. Handy!

We also use X-Accel-Redirect to serve many of the files from our NFS server using nginx directly; rather than having PHP buffer them from disk. This is much more efficient and saw a huge drop in our CPU usage on the web servers with only a minor increase in CPU usage on our frontends. Basically, web serving software is designed for serving bytes off disk, whilst php is not. Again, it may seem a bit strange to serve the files on the load balancer rather than the web servers, and this is something that we may change in due course, but at present we are forced to use Apache on our web servers, and the mod_sendfile Module for Apache2 which does the same thing as X-Accel-Redirect for nginx is much less mature.

After traffic has been terminated, and cached content retrieved and served, it's then passed to our load balancing software, haproxy.

haproxy is an awesome little tool, which supports a range of really powerful features including different allocation methods, session stickiness, and it is also protocol aware for some protocols. It also has handy logging.

I'll briefly mention that we use keepalived to manage the VRRP layer of our stack. Each of our load balancers has a dedicated management IP, and a virtual service IP. The management IP never changes and reflects the name of the server providing service (e.g. moodle-fe0.lancs.ac.uk). Meanwhile, the VIPs are free to fly wherever they need.

We also use round-robin DNS to direct traffic to the two load balancers. I'd consider something like multicast DNS, but in our current environment all of our servers are in the same pair of datacentres and are only a mile apart and use the same IP range as everything else on campus. There's really very little point at this time.

At any point, we can take a load balancer out of service for maintenance. We frequently do so and our end users shouldn't notice at all. They'll still get sent to the same web servers that were handling their request before.

Software distribution

As I mentioned before, we're Debian nuts. We love Debian. We use it for pretty much everything (I think we have one Ubuntu box for BigBlueButton, but that's a Debian derivative anyway).

Server configuration

We also have a configuration management suite called configutil. It was written by a former employee, Chris Allen, and was originally based on Oxford University's configtool. However, we've pretty much rewritten it now and it does some pretty cool stuff. This includes distributed iptables, user management, package management, service management, and file deployment. A large chunk of this is actually handed over to puppet, but we build the puppet manifest with configbuild, and deploy the files with configutil/luns-config.

We keep all of our server configuration in git too (did I mention, we like git), build the configuration using configbuild. Servers have a deployment tool called luns-config which syncs against the configuration server.

In addition to liking Moodle, Mahara, Debian, and git, we also like security. In fact, we really really like security.

I'm not just talking about security in getting onto our systems (all of our servers are behind our corporate firewall, plus have strict iptables. We then enforce ssh keys on all servers). We also like our configuration to be safe. Our configuration is served over SSL, with client-side key verification too using our internal certificate authority. We generate revocation lists frequently and if the list goes out of date (6 monthly IIRC), we stop serving any configuration. A server can only retrieve configuration for the server named in it's configuration management certificate. On our package management server, we employ the same type of client-side certificate requirement so only systems with a valid SSL certificate and key-pair can access our configuration.

Software deployment

So now we've got all of that out of the way... I did mention that we really like Debian right? Right, good. Because we deploy all of our software in the form of Debian Packages. I mean all of it.

We've gone down this route for a number of reasons, some of them theoretical advantages, and some of them learned from experiences. They really come down to these though:

we want to be sure that we know what software is on a server;
we want to be sure that each server in a group is identical;
we want to be able to install a new server quickly and easily;
we want the ability to roll back to a previous version if we screw up; and
we have a tendency to twitch if we come across things out of place.

Basically, what it comes down to, is that we want to be able to quickly and easily build replacement servers, add new servers, re-install servers, etc. Most of our servers are entirely disposable. We try to keep all data on dedicated storage. As I mentioned before, our moodledata is on NFS. In reality, most of our data across all servers is stored on our NAS and served over NFS.

So if we discover that we're breaching the limits of our server configuration, we can scale horizontally (that is the right one isn't it?) and have a server built with a known configuration in a very short period of time (typically about an hour).

To this aid, we package all of our software. So each Moodle installation is a separate Debian package. Debian packages are awesome.

When we upgrade Moodle, we update the code using git (see thamblings.blogspot.com/2013/07/upgrading-moodle-from-git.html for my post on that topic). Once we've done that, we merge our new deployment branch into a new git branch - luns-moodle-lu_2.5.

This has the debian packaging information in it and this is where we create our package from.

Why not just keep our packaging data in our LUVLE-2-5.deployment branch? Well, we could do, but we feel that this is cleaner. I mean cleaner both in terms of separation of processes, and history.

For example, if we discover a bug in our package (like a missing dependency), then we want to make that change on our packaging branch. We don't want that change mixed up with the history of our Moodle codebase.

If you're interested in our package skeleton, I've put it in a gist at https://gist.github.com/andrewnicols/ae439676d116e9a6582f. These files all go into the debian directory, and then you can run dch --create to create an empty Changelog. You will, of course, need to update the control file to reflect your package name.

So once we've made our chnages, we merge the deployment branch into our packaging branch; we incremement the version number; build the package; add it to our package server; and deploy each of the frontends. Here's a summary of that process:

Hmm - that looks very long winded, but in manys ways it's just lots of small repetitive tasks which separate concerns, and make our lives easier in the long-run.

Now we've done that, to deploy on our five web servers, and our cron server we just run:

sudo apt-get update; sudo apt-get upgrade

Nice, and easy.

Saturday 6 July 2013

Upgrading Moodle from Git

Upgrading Moodle with Git

Background

I've been working on a new deployment of Moodle for Lancaster University for the past two years or so. Our project started out with Moodle 2.1, and we upgraded for our initial pilot to Moodle 2.2.
Since then, we've upgraded from Moodle 2.2 to 2.3; and we're now planning the upgrade from 2.3 to 2.5.
We manage all of our upgrades with git, and our deployment using Debian packages.
I've been asked a couple of times to write about our upgrade methodology and reasoning so hopefully others will find this useful.
We use a variety of git features, but new features are added to git all the time which change our deployment methodology from time-to-time. At present we use:

branches

Branches, Tags, and Remotes

We have quite a few of these, but they really do make our life easier. Git is a fantastic tool, and if used to it's full extent, having a lage number of branches actually makes your life much easier, and less complicated. Having multiple remotes helps to separate various concerns too so you can ensure that it's harder to inadvertantly publish your institution's IP.

Branches

In summary, we have one branch per feature, hack, or change in core Moodle - no matter how small. These are named in a (hopefully) sensible naming scheme to help identify them from one another easily and quickly. The name describes the project/customer (usually LUVLE in our case), the version, the type of the change, and the frankenstyle name for that change. For some changes, we have an optional short name to describe the branch further. Our naming scheme works out as:


        {customer}-{major-version}-{change type}-{frankenstyle}[-{shortname}]

Where we have several related features which must co-exist and cannot be used without one another, we use a custom frankenstyle name of set_{name}.
As an example, these are some fo the branches for our impending Moodle 2.5.0 upgrade:

LUVLE-2.5-feature-mod_ouwiki
LUVLE-2.5-feature-block_panopto
LUVLE-2.5-feature-local_luassignment
LUVLE-2.5-feature-set_bigbluebuttonbn
LUVLE-2.5-hack-mod_resource-singlefiles

All of these branches are based on the same upstream tag - in this case, v2.5.0 for the 2.5.0 version of Moodle. We always use this tag. Even when 2.5.32 has come out we will still use 2.5.0 (though hopefully we won't ever get that far behind!). This may seem a touch strange at first, but when it comes to merging all of our features and changes together into a single testing or deployment branch, we want to avoid any merge conflicts created by different versions. It's also much easier when it comes to subsequent newer versions of Moodle in the future.
By having each feature in it's own branch, we're able to develop, and test that branch entirely in isolation.
In the rare cases that we are working with a feature which needs a minimum release version which includes a minor increment (e.g. 2.3.1), we check out from that tag instead, but we try to avoid this to make things simpler.
In addition to all of the feature and hack branches, we also have a range of testing and deployment branches. Generally, we have a main test branch which contains the same branches as our deployment environment. This is updated frequently when we want to test an upgrade to a whole branch in combination with the rest of our installation, or to test an upgrade to Moodle. Meanwhile, we typically only have a single deployment branch - LUVLE-2.5-deployment. This is to avoid any confusion and potential for dropped branches.

Remotes

As I mentioned before, one great reason for multiple remotes is to give you a separation of concerns. There are times where you don't wish to push some of your branches to the public, other times where you're working on bug fixes you don't really need to push to an internal repository, and all manner of other reasons besides.

I have the following remotes to make my life easier:

origin - git.moodle.org/moodle.git - my main upstream;
integration - git.moodle.org/integration.git - the moodle.org integration branch. Useful when fixing issues that crop up during integration;
public - github.com/andrewnicols/moodle.git - the repository I push any bug fixes and new features for the community to; and
cis - ciggit.lancs.ac.uk/moodle.git - our main internal repository.

As a general policy, and from experience of making oopsies, I've found it best to have each remote start with a different letter - it also makes tab completion much less frustrating.

The process

When a new release of Moodle comes out, we've typically taken a bit of laborious approach to things. Whilst there are a lot of steps, I feel that in the long run they've been less frustrating than trying to resolve any merge conflicts; and we've saved time trying scratching heads trying to work out where this change, or that whitespace conflict came from.

Initial set-up

Once we've checked out a new branch for every single feature, and hack, we begin to bring them all together. That's nice and easy when you're just starting out - just git merge a lot:
And hey-presto... we should have our Moodle 2.5 installation ready for testing and deployment. Once we're happy with our installation, we usually then create a deployment branch from that testing branch.

Grabbing a fix from upstream

We frequently come across issues which have already been fixed in upstream Moodle, or which we ourselves have helped to fix. Sometimes we also backport features from a newer branches onto our production branch if we really really want it.

We do all of this with the fantastic git cherry-pick command which allows you to pick a commit, or a number of commits, and apply them to your current branch.

Updating our local branches

For updating one of our feature branches, we simply make our changes to that specific branch, and then merge them back in again:

Upgrading Moodle - Minor releases

In reality, a minor update to Moodle is just the same as an update to one of our local branches. If anything, it's probably simpler:

Upgrading Moodle - Major releases

This is where things get much more complicated, and where the number of steps and the complication rapidly increases. That said, in my opinion they also reduce the confusion later on.

Externally provided code

We start by grabbing the latest version of the externally provided plugins and starting brand new branches for them. There's usually very little point in keeping the history for those branches as it doesn't contain any of the upstream commit messages.

Local branches which need updating

For our local branches, we want to preserve this history of our changes. We also want to remove any confusion with merges to newer versions to keep the history as clear as possible.

To do so, we use the wonderful git rebase --onto command.

With a normal rebase command, git takes every commit since your branch diverged from the new upstream, and attempts to replay each of them on top of the new head.

The --onto tells rebase where to take the commits from for the re-application. That's to say, that if you only have one commit since you branches from the tag, it grabs that commit, and immediately tries to replay on top of the target version. Ordinarily, it would attempt to reapply your commit on top of every other commit.
That's it. It's really simple, but it needs to be done for each and every branch that you have.

Finally, once all branches have been updated, we merge them into a new testing branch and begin our testing phase.

Tracking things

In order to make sure that we don't lose track of anything, or forget a branch during an upgrade, we make use of our issue tracking software Redmine.
When we start to put a version together, for example Moodle 2.2, we create a new task for the next upgrade - in this case, 2.3.
As we include each of our feature branches, we create a new subtask under the 2.3 task.
When we came to upgrading to Moodle 2.3, we then go through each of those subtasks and make any relevant notes from the upgrade process. We also create a new task for the subsequent upgrade (e.g. Moodle 2.5).
If we are no longer including a branch because it is now redundant or we have decided to change the functionality offered, we also note this in the relevant issue.
This all helps to ensure that we don't forget an issue and that we keep a record of all changes.

Thoughts and ramblings

Pages