Thankfully, Rocks v4.3 kernel roll and Lustre v1.6.1 appear to be based on the same kernel version. This saved me from having to get the kernel source, and patch it for Lustre. I've done that before with some success, but I'm happy to be able to avoid doing it again. The following applies to setting up Rocks so that Lustre is installed and ready on compute nodes. I also put Lustre on the frontend node, but that was done manually, simply by installing the rpms, and configuring grub.
Steps I followed to get the Lustre kernel, modules and tools installed on the compute nodes
<package>e2fsprogs</package> <package>lustre</package> <package>lustre-modules</package> <package>lustre-ldiskfs</package> <package>kernel-lustre-smp</package>Then put the following at the end of the extend-compute.xml file:
/sbin/grubby --grub --add-kernel=/boot/vmlinuz-2.6.9-55.EL_lustre-1.6.1smp \ --initrd=/boot/initrd-2.6.9-55.EL_lustre-1.6.1smp.img --make-default \ --config-file=/boot/grub/grub-orig.conf --copy-default --title=Rocks-Lustre mkdir -p /mnt/lustre mkdir /etc/modprobe.d <file name="/etc/modprobe.d/lustre" mode="create"> options lnet networks=<var name="Lustre_NetType"/>0 </file> <file name="/etc/fstab" mode="append"> <var name="Kickstart_PrivateHostname"/>@<var name="Lustre_NetType"/>0:/lustre /mnt/lustre lustre defaults,_netdev,noauto 0 0 </file>
The "post" section I've used only changes the grub-orig.conf file, not the rocks.conf file. This should be OK, since rocks-grub, which switches the grub config file between these two, will still do its job (i.e. triggering reinstallation on a crash), even though rocks.conf does not exactly agree with the boot options in grub-orig.conf.
rocks add var service=Lustre component=NetType value=tcp
The next goal is to create a Rocks node for the Lustre OSSs. It could be useful in general to create an node type for a Lustre client, but that actually isn't necessary on the development cluster, where all nodes are Lustre clients, and I can just use the above method to extend the compute node configuration. OSSs are a different animal, however, and need their own appliance type.
Steps needed for the Compute/OSS appliance type:
mkdir -p /mnt/ost0 /sbin/grubby --grub --config-file=/boot/grub/grub-orig.conf \ --update-kernel=/boot/vmlinuz-2.6.9-55.EL_lustre-1.6.1smp \ --args='elevator=deadline' grep -v '/state/partition1' /etc/fstab > /etc/fstab.tmp cat << EOF >> /etc/fstab.tmp #LABEL=lustre-OSTXXXX /mnt/ost0 lustre defaults,_netdev 0 0 EOF mv -f /etc/fstab.tmp /etc/fstabNote that the line added to /etc/fstab is commented. The reason is that I'm assuming that the node is being installed on a system without a Lustre-patched kernel, so the file system cannot be formatted at the time of installation. The formatting must occur after booting the new kernel.
<graph> <description> Compute/Lustre OSS appliance. </description> <changelog> </changelog> <edge from="compute-oss"> <to>compute</to> </edge> <order gen="kgen" head="TAIL"> <tail>compute-oss</tail> </order> </graph>Finally add the appliance to the Rocks DB:
rocks add appliance compute-oss membership="Compute/OSS" node=compute-oss graph=default short-name=co
I set up the frontend node as the MGS/MDS manually, simply following the Lustre instructions. Install the RPM's and edit /etc/fstab as needed.
That should be enough to set up Lustre on a new Rocks installation. Note that the Lustre filesystem will not be ready to use, since the filesystem needs to be formatted when a node is running the Lustre-patched kernel, which occurs after rebooting the node at the end of the installation process. To facilitate that process, first create the following file at /home/install/sbin/ost_init.sh:
mkfs.lustre --reformat --ost --mgsnode=bes-00@tcp0 /dev/sda5 mount -t lustre /dev/sda5 /mnt/ost0 LABEL=`e2label /dev/sda5` sed -e "s/#LABEL=lustre-OSTXXXX/LABEL=$LABEL/" /etc/fstab > /etc/fstab.tmp mv -f /etc/fstab.tmp /etc/fstab
Then use ssh or tentakel to execute the script on all Compute/OSS nodes. Note that the script has a hard-coded device name (/dev/sda5), which works on the development cluster, but may not work elsewhere. Ideally, the script should be made a bit more flexible...at some time in the future.
At this point, the Lustre filesystem can be mounted by all Compute/OSS nodes, as well as the frontend node. ssh or tentakel can be used to do that efficiently on the compute nodes.
I have decided to make all compute nodes on the development
cluster into OSSs as well, since the (advised)
restriction of keeping OSSs separate from clients has been
removed in Lustre 1.6. which exposes the cluster to a
known, but unlikely problem with clients and OSSs on the same
node.