Linux Datacenter: Notes on simple design

Clusters and server farms are complex stuff. Complex are applications that run on them. (these are truisms don't they). So why add even more complexity with management. I mean the sysadmin's job is to simplify his things as much as possible. The sysadmin's work should put the least possible fingerprint on the whole setup. Simple cluster designs may not have a high-tech look&feel. They don't run the bleeding edge, untested software with funky features. They rather follow the KISS rule and are based on primitive designs. They don't require as much attention as complicated clusters.

Here are some interesting quotes I've come across:

"Autopilot: Automatic Data Center Management"
Michael Isard
Microsoft Research
"We believe that simplicity is as important as fault-tolerance when building a large-scale reliable, maintainable system. Often this means applying conservative design principles: in many cases we rejected a complex solution that was more efficient, or more elegant in some way, in favor of a simpler design that was good enough. This requires constant discipline to avoid unnecessary optimization, and unnecessary generality. ―Simplicity‖ must always be considered in the context of the entire system, since a solution that looks simpler to a component developer may cause nightmares for integration, testing, or operational staff.

21st Large Installation System Administration Conference (LISA ’07)
On Designing and Deploying Internet-Scale Services
James Hamilton
"Keep things simple and robust. Complicated algorithms and component interactions multiply the difficulty of debugging, deploying, etc. Simple and nearly stupid is almost always better in a high-scale service-the number of interacting failure modes is already daunting before complex optimizations are delivered. Our general rule is that optimizations that bring an order of magnitude improvement are worth considering, but percentage or even small factor gains aren’t worth it."

Here goes what I do to keep the setup simple:

Automation
When automating (for example server deployment), I usually avoid writing one big script which does everything. Deploying a server usually takes several steps and a few different scenarios. You choose different installations based on machine types, change machine's vlan, make it network boot, supply the image etc. etc. Scenarios also differ. Sometimes a new server comes and you need to introduce it to the cluster and sometimes you need to redeploy a machine which is broken. Sometimes you redeploy a machine and assign it to a different server farm. A script to tackle all of this all would be error prone and also difficult to test in production (you don't test errornous thing on production right?).
Instead, I break the automated action into small pieces. For each of them I write a small script (one script to take care of networking setup, one for pxe-booting a server etc., one for updating cluster membership etc.). Then I test these scripts. It's easy as they are not robust and unlikely to have errors. I also try to base them on primitive protocols e.g. using telnet or ssh with an expect library. Then I try to glue them together into bigger stuff.

Operating systems
Operating systems are simple when they are standard installations. Having large number of servers, one should not customize any element of them by hand. Instead I found it is quite good to package all your scripts into a software package like rpm or deb and roll it out. For example, you may have a package called "my-custom-setup-.rpm" which places custom scripts into /usr/local. It also installs all the dependancies - rpm handles that. In general, it's also a nice idea to have a local copy of your distro's repo.
To distribute /etc/ contents, puppet from Reductivelabs is a great tool. I usually use puppet to ensure the latest version of my custom rpm is installed and to make changes to /etc. (puppet is not good at distributing large files, so syncing binaries with it is no good - rpms handle this much better)

Networking
If possible, it's desirable to have all the machines within a single network without vlans. Vlans introduce complexity into server deployment process. Usually you deploy a server within one vlan where you have PXE and tftp. When deployment is finished you need to change the vlan, which involves changing it on your switch. You run a large number of machines, so you need to automate this process. You also need to keep a mapping between your servers and switch ports. After you change the vlan, you cut the access to your server. So now you need to access its management processor to reset it to get a new address. You also need to know management address of every server. In most cases having vlans is a must and cannot be avoided, but if you don't need it, don't use it ;-)

Agents
To effectively manage the servers, you usually install some agents on them. For me the best agent is sshd with pubkey authentication. It's rock stable, which gives you two things: you don't have problems with accessing your servers. If you have - then you can safely assume that the whole machine is down for some reason (swapped out, turned off, etc.) - this is the second thing. OK - I also run puppet clients. But the general idea is that management agents should be simple and known-for-years stuff.

Reliability
Clusters' reliability is based on multitude of nodes. As a rule - I don't guarantee my users any failover mechanism. It's up to the application to handle it. I don't build any HA into systems themselves.

Information
It's common that in the lifecycle of a cluster some machines come and go, some migrate between clusters. It's crucial to keep this information somewhere. A simple, "primitive" approach to this could be like this one. DNS is a database, simple, widely-used, guaranteed to work (replication built-in ;-). The client is built into any linux.

It's very easy to become "innovative" in the bad sense when it comes to clusters. I mean being "bleeding edge" often means "complicated". Many ready to use datacenter solutions are all-in-one bulky software, which promises everything, but in fact it's very complicated, tailored to specific systems and platforms (take a look HP SIM, IBM Director).

In fact, most sysadmins I know, end up with custom tools built on opensource components with simplicity in mind.

2 comments:

jeffatrackaidApril 27, 2010 at 2:38 PM
I am a big fan of the KISS method. I've seen many bloated cluster installs with overly complex setups. I tend to see this among less experienced system administrators. They must find the complexity appealing as it poses a new challenge.

What are your thoughts on virtualization? I've found that it is very useful in some cases while merely adding another layer of complexity in others.
linuxdatacenter blog adminApril 27, 2010 at 9:50 PM
Hi,
In my opinion virtualization also simplifies things. It standardizes your hardware platform (you no more care about what hdd drivers compile into initrd). On the other hand, you introduce the hypervisor which complicates stuff (mainly inventory - you need to keep hw->vm mappings). But machines nowadays are so powerful that it's a real waste not to have virtualization ;-)
I use it mainly to partition powerful hardware. I don't pay much attention to live migration (requires shared storage - which is expensive) . Besides, clusters provide error resilience by multitude of nodes, not single server failover.
My personal choice right now is Citrix Xenserver (easy for massive deployments, reliable)

Linux Datacenter

Tuesday, April 27, 2010

Notes on simple design

2 comments:

Blog Archive

Tags

About Me