Achieving harmony with Rocks and Lustre

Thankfully, Rocks v4.3 kernel roll and Lustre v1.6.1 appear to be based on the same kernel version. This saved me from having to get the kernel source, and patch it for Lustre. I've done that before with some success, but I'm happy to be able to avoid doing it again. The following applies to setting up Rocks so that Lustre is installed and ready on compute nodes. I also put Lustre on the frontend node, but that was done manually, simply by installing the rpms, and configuring grub.

Steps I followed to get the Lustre kernel, modules and tools installed on the compute nodes

The next goal is to create a Rocks node for the Lustre OSSs. It could be useful in general to create an node type for a Lustre client, but that actually isn't necessary on the development cluster, where all nodes are Lustre clients, and I can just use the above method to extend the compute node configuration. OSSs are a different animal, however, and need their own appliance type.

Steps needed for the Compute/OSS appliance type:

I set up the frontend node as the MGS/MDS manually, simply following the Lustre instructions. Install the RPM's and edit /etc/fstab as needed.

That should be enough to set up Lustre on a new Rocks installation. Note that the Lustre filesystem will not be ready to use, since the filesystem needs to be formatted when a node is running the Lustre-patched kernel, which occurs after rebooting the node at the end of the installation process. To facilitate that process, first create the following file at /home/install/sbin/ost_init.sh:

mkfs.lustre --reformat --ost --mgsnode=bes-00@tcp0 /dev/sda5
mount -t lustre /dev/sda5 /mnt/ost0
LABEL=`e2label /dev/sda5`
sed -e "s/#LABEL=lustre-OSTXXXX/LABEL=$LABEL/" /etc/fstab > /etc/fstab.tmp
mv -f /etc/fstab.tmp /etc/fstab
    

Then use ssh or tentakel to execute the script on all Compute/OSS nodes. Note that the script has a hard-coded device name (/dev/sda5), which works on the development cluster, but may not work elsewhere. Ideally, the script should be made a bit more flexible...at some time in the future.

At this point, the Lustre filesystem can be mounted by all Compute/OSS nodes, as well as the frontend node. ssh or tentakel can be used to do that efficiently on the compute nodes.

I have decided to make all compute nodes on the development cluster into OSSs as well, since the (advised) restriction of keeping OSSs separate from clients has been removed in Lustre 1.6. which exposes the cluster to a known, but unlikely problem with clients and OSSs on the same node.