There is a cluster deployment quick start tutorial that does not really work for Fedora.
We have a heterogeneous cluster of various Intel and AMD boxes, and some have different kernel versions. GraphLab does not like that different CPUs (or kernels or software versions; I haven’t really investigated) are being utilized and will complain that the binary checksums don’t match if you attempt to run it. Fortunately, a subset of our cluster has the same hardware.
Here is what I did on our cluster running Fedora 19:
Install your developer tools, build environment, and OpenMPI.
yum install cmake openmpi openmpi-devel
Set up passwordless SSH. You should be able to login from one machine to another using SSH keys.
Include OpenMPI into your
PATH
environment variable:export PATH=/usr/lib64/openmpi/bin:${PATH}
Note that the environment variables don’t stick when using non-interactive SSH. To change that, we’ll create and edit a few files:
echo '. $HOME/.bashrc' >> ~/.ssh/environment
.bashrc
will return by default if not running interactively, so we need to export PATH before it checks. Add the following to the beginning of your~/.bashrc
:PATH=/usr/lib64/openmpi/bin:${PATH}
Compile GraphLab. The tutorial tells you to rsync the binaries to all the machines, but we (for better or worse) have a folder mounted over NFS that contains GraphLab.
Configure your
machines
file and test it:mpiexec \ -mca btl ^openib -n 3 \ -hostfile /nfs/machines.txt \ date
(We use
-mca btl ^openib
to suppress warnings about not being able to find Infiniband)If all goes well, we can execute a test application:
mpiexec -mca btl ^openib -hostfile /nfs/machines.txt \ /nfs/graphlab/release/toolkits/graph_analytics/pagerank \ --powerlaw=1000000