12/11/2015

Balancing Act

There comes a time with most IT systems where you have to call it a day and upgrade. Reasons often include increased load and equipment failure, and in the case of UWCS we have both.

Our main server, codd (after Edgar F. Codd) has been going for a long time now. A lot of things have changed since it was built, including what users expect their hosting solutions to provide for them. Coupled with the improvements in hardware and software this means that there isn't any excuse for providing a good service. No matter how 'executive series' the motherboard in codd is, it doesn't make up for the fact that it's less powerful than my laptop. Sure, it still works, but we can do better.


As luck would have it, we were gifted some poweredge servers last year that pack a bit more of a punch. Along with some of the others that we've bought we ought to have enough metal for a decent setup. But a racks worth of servers wouldn't be much without somewhere to put them! Some grovelling to the department of computer science prevented us having what would have amounted to a very pricey purchase. It didn't fit the servers though, so we had to take it apart and bolt it back together in a more accommodating configuration.


With the servers racked and imaged came the task of actually figuring out the details of this grand plan. I mean, it would of if we hadn't spent six months procrastinating. Once the shiny new toys had come out of the packaging things slowed right down. Thankfully we've started this year with some fire in our collective belly and actually come up with a plan we might actually be able to implement! The design features a number of application servers behind a load balancer, with each app server containing a copy of user files and databases. It might be better to have dedicated storage/db clusters if we had more traffic, but the focus here is on something that holds up to the next ten years whilst being doable in between lectures and assignments. And being intelligible to the next tech officer who has to inherit all of this :P

Ceph is a distributed file system, and how we've chosen to store the files that users store in their hosting accounts. We'll need some way of letting users interact with the system whilst keeping the app servers in sync, so all SSH sessions will terminate at nodd, which will have the Ceph drive mounted. Any changes made will be propagated to their actual location on disk which will be on all of the cluster nodes. Currently there are two, but in the future we will be able to spin up some more and just add them in to the haproxy configuration. SSL connections are terminated at nodd, which then uses regular http to retrieve files across the internal network. This way we can add a cookie to http sessions ensuring that users are served by the same backend server for the entire session (important for managing things like logins).

It's been a long road, but I think we might just make it!