Wednesday, April 14, 2010

Linux clusters - first aid kit

Here is a list of tools to build and manage linux clusters and farms. Some of them I used a lot, some I only know to be respected among fellow sysadmins. All of them are free and/or GPL-ed.

Distros

-> OSCAR - quoting the webpage "OSCAR allows users, regardless of their experience level with a *nix environment, to install a Beowulf type high performance computing cluster. It also contains everything needed to administer and program this type of HPC cluster.". If you are in the HPC business and need MPI, scientific libraries etc. installed out of the box - this one is for you ;-)

-> ROCKS - quote "Rocks is an open-source Linux cluster distribution that enables end users to easily build computational clusters, grid endpoints and visualization tiled-display walls. Hundreds of researchers from around the world have used Rocks to deploy their own clusters". AFAIK this one is based on redhat. Job queueing system, MPI and scientific apps included in so called "rolls" (see the doc).

Administrator toolbox

-> cobbler - deployment tool using kickstart/preseed. Redhat-centric.

-> systemimager - one of my personal favourites. A VERY massive server deployment tool using rsync and torrents. It supports many cool features like cloning a server without the need to bring it down. Hardware independent imaging possible with a few hacks. When you run a homogenic server farm it's also a very convenient server management tool - images are stored as directory structures - it's possible to chroot in them, change, and sync changes to live systems! ;-). Some killer apps like parallel multithreaded shell and file distribution tool are also included.

-> capistrano - an automation tool. Have not used personally, but I plan to check it out some day

-> TORQUE resource manager - none of HPC clusters can live without a proper queueing system. Torque maintains your job queue, determines free resources (cpus, ram) on your compute nodes and distributes computations on cluster nodes.

-> mcollective - A job execution framework. I have not used it personally, but plan to give it a shot. From the webpage: "The Marionette Collective aka. mcollective is a framework to build server orchestration or parallel job execution systems. Primarily we'll use it as a means to programmatically execute actions on clusters of servers."

-> puppet - a language to describe your datacenter. Just one thing to say about it - A MUST-HAVE. It's hard to describe all the things this tools does (definitely, expect a detailed post on it soon). It is a language to describe server configuration. You can group machines in classes, describe what services to run, which users to have on various type of nodes with various operating systems. If you seek a simpler replacement for cfengine - this is the choice ;-)

-> ganglia - a monitoring framework designed for clusters. Supports aggregation functions for multiple nodes.

-> freeIPA - authentication framework based on kerberos,ldap.

If you think the list is not complete, please comment ;-)

1 comment:

  1. Thanks for the iformation. I am currently maintaining the server back in the office and when I research about servers it's really helpful to read blogs just like this. Thanks for sharing.

    Datacenter

    ReplyDelete