graphlab on a fedora cluster

There is a cluster deployment quick start tutorial that does not really work for Fedora.

We have a heterogeneous cluster of various Intel and AMD boxes, and some have different kernel versions. GraphLab does not like that different CPUs (or kernels or software versions; I haven’t really investigated) are being utilized and will complain that the binary checksums don’t match if you attempt to run it. Fortunately, a subset of our cluster has the same hardware.

Here is what I did on our cluster running Fedora 19:

  1. Install your developer tools, build environment, and OpenMPI.

    yum install cmake openmpi openmpi-devel
    
  2. Set up passwordless SSH. You should be able to login from one machine to another using SSH keys.

  3. Include OpenMPI into your PATH environment variable:

    export PATH=/usr/lib64/openmpi/bin:${PATH}
    

    Note that the environment variables don’t stick when using non-interactive SSH. To change that, we’ll create and edit a few files:

    echo '. $HOME/.bashrc' >> ~/.ssh/environment
    

    .bashrc will return by default if not running interactively, so we need to export PATH before it checks. Add the following to the beginning of your ~/.bashrc:

    PATH=/usr/lib64/openmpi/bin:${PATH}
    
  4. Compile GraphLab. The tutorial tells you to rsync the binaries to all the machines, but we (for better or worse) have a folder mounted over NFS that contains GraphLab.

  5. Configure your machines file and test it:

    mpiexec \
      -mca btl ^openib -n 3 \
      -hostfile /nfs/machines.txt \
      date
    

    (We use -mca btl ^openib to suppress warnings about not being able to find Infiniband)

    If all goes well, we can execute a test application:

    mpiexec -mca btl ^openib -hostfile /nfs/machines.txt \
      /nfs/graphlab/release/toolkits/graph_analytics/pagerank \
      --powerlaw=1000000