Tuesday, May 17, 2011

Set Up an MPICH2 Cluster With Ubuntu Server 11.04

This is a short note of setting up MPI cluster with Ubuntu system. Assume four machines named ub0-3, with ub0 being the master node. They are connected through private local network. At the time of writing this note, the operating system of these machines is Ubuntu Server 11.04.

1. Define hostnames in /etc/hosts

Edit /etc/hosts like this (change IP address accordingly):

127.0.0.1 localhost

192.168.133.100 ub0

192.168.133.101 ub1

192.168.133.102 ub2

192.168.133.103 ub3



2. Share files through NFS

2.1. Install NFS server on the master node:

user@ub0:~$ sudo apt-get install nfs-kernel-server

2.2. Make a directory to be exported:

user@ub0:~$ mkdir /mirror

2.3. Edit file /etc/exports by adding a line like this:

/mirror ub1_hostname(rw,sync,no_subtree_check) ub2_hostname(…) …

2.4. Restart the NFS service:

user@ub0:~$ sudo service portmap restart

user@ub0:~$ sudo service nfs-kernel-server restart

2.5. Configure firewall to open two ports 111 and 2049 if necessary:

user@ub0:~$ sudo iptables -I INPUT num -p tcp --dport 111 –s ub1_hostname --j ACCEPT

user@ub0:~$ sudo iptables -I INPUT num -p tcp --dport 2049 --s ub1_hostname --j ACCEPT



2.6. Mount the directory on client machines:

user@ub1:~$ sudo apt-get install nfs-common

user@ub1:~$ sudo mount ub0:/mirror /mirror



3. Set up an user account for running MPI

3.1. Create a user with same name and same userid on all nodes with a home directory in /mirror:

user@ub0:~$ sudo useradd -d /mirror/mpiuser -m -s /bin/bash mpiuser

user@ub0:~$ sudo passwd mpiuser

… #repeat the same process on other nodes (same password on all the nodes is not necessary)

3.2. Change the owner of /mirror to mpiuser:

user@ub0:~$ sudo chown mpiuser /mirror



4. Install SSH server on all nodes if SSH not available

user@ub0:~$ sudo apt--get install openssh--server



5. Setup SSH with no pass phrase for communication between nodes

5.1. Log in with the new user on the master node:

user@ub0:~$ su mpiuser

5.2. Generate ssh key:

mpiuser@ub0:~$ ssh-keygen -t dsa # Leave passphrase empty. A folder called .ssh will be created in the home directory. This folder will contain a file id_dsa.pub that contains the public key.
mpiuser@ub0:~$ ssh-keygen -t rsa # Leave passphrase empty. A folder called .ssh will be created in the home directory if not exist previously. This folder will have a file id_rsa.pub which contains the public key and a file id_rsa which contains the private identification.

(Update on 11/05/2014. SSH can use either "RSA" (Rivest-Shamir-Adleman) or "DSA" ("Digital Signature Algorithm") keys. Both of these were considered state-of-the-art algorithms when SSH was invented, but DSA has come to be seen as less secure in recent years. RSA is the only recommended choice for new keys. Please see ref 3)

5.3. If, as in the present note, the home directory is shared by NFS between master and slave nodes for user mpiuser, run the following command on the master node so as to copy public key to ~/.ssh/authorized_keys and then go to step 5.5: (Updated on 24 August 2012)

mpiuser@ub0:~$ cat ~/.ssh/id_dsa.pub >>~/.ssh/authorized_keys
mpiuser@ub0:~$ cat ~/.ssh/id_rsa.pub >>~/.ssh/authorized_keys

#please note the file ~/.ssh/authorized_keys should be in mode '644'

5.4. If the home directory is not shared by NFS between master and slave nodes for user mpiuser, transfer the public key file id_dsa.pub id_rsa.pub to slave nodes. On the slave nodes, log in as mpiuser, and then add the key generated from master node to file ~/.ssh/authorized_keys:

mpiuser@ub1:~$ cat id_dsa.pub >>~/.ssh/authorized_keys
mpiuser@ub1:~$ cat id_rsa.pub >>~/.ssh/authorized_keys

#please note the file ~/.ssh/authorized_keys should be in mode '644'


5.5. To test passwordless SSH login, run this command on the master node:

mpiuser@ub0:~$ ssh ub1 hostname

#It should return remote hostname without asking for passphrase.

(Update 2: to resolve file permission issue, ensure the user of accessing NFS shared files have same UID and GID on both server and client. Edit /etc/passwd and /etc/group when necessary)

6. Install GCC and other compilers

mpiuser@ub0:~$ sudo apt-get install build-essential



7. Install MPICH2

7.1. Install through package or using Synaptic:

mpiuser@ub0:~$ sudo apt-get install mpich2

7.2. Test installation by running:

mpiuser@ub0:~$ which mpiexec

mpiuser@ub0:~$ which mpirun

#If successful, paths to the two executables should be return

7.3. Set up Hydra Process Manager (which replaces MPD) on master node:

7.3.1. Create a file with the names of the nodes

mpiuser@ub0:~$ cat hosts

ub1:8 #8 is the number of cpu cores available

ub2:8

ub3:8

ub0:7

7.3.2. The hosts file can also be designated by setting variable HYDRA_HOST_FILE:

mpiuser@ub0:~$ echo HYDRA_HOST_FILE=/etc/hydra/hosts >> ~/.bashrc
#Note: do not set this environment variable HYDRA_HOST_FILE if the MPI environment is to be controlled by a cluster queueing system like SGE. (Updated on 17 May 2012)


7.4. Test parallel computation

7.4.1. Compile an application with mpicc:

mpiuser@ub0:~$ mpicc cpi.c -o cpi

#cpi.c is an example provided by the MPICH2 library

#On my machine, running mpicc got an error:

#/usr/bin/ld: cannot find –lcr

#collect2: ld returned 1 exit status

#This problem is due to missing package ‘libcr-dev’

#Install libcr-dev: mpiuser@ub0:~$ sudo apt-get install libcr-dev

7.4.2. Run the application using mpiexec:

mpiuser@ub0:~$ mpiexec -np 1 ./cpi #using single core

mpiuser@ub0:~$ mpiexec -np 16 ./cpi #using 16 cores

#Note the path to the executable (‘./’ here) should be specified, otherwise an error produced: HYDU_create_process (./utils/launch/launch.c:69): execvp error on file cpi (No such file or directory) (Updated on 17 October 2011)

#If hydra is not the default process manager for MPICH2, running mpi program with mpiexec may cause an error that all processes have rank 0. To overcome this error, use mpiexec.hydra instead of mpiexec (Updated on 17 October 2011)

#Configure firewall if connection is blocked, eg,

#mpiuser@ub0:~$ sudo iptables -I INPUT 1 -s ub1 -j ACCEPT

(Update 1: I've posted a relevant topic "Install Sun Grid Engine (SGE) on Ubuntu Server 11.04")

References
1. Setting Up an MPICH2 Cluster in Ubuntu (https://help.ubuntu.com/community/MpichCluster)
2. MPICH2 Developer Decumentation (http://wiki.mcs.anl.gov/mpich2/index.php/Developer_Documentation)
3. https://help.ubuntu.com/community/SSH/OpenSSH/Keys

30 comments:

durutti said...

Hi when I try
sudo chown mpiuser /mirror

on the node (e.g ub1 at your post)

i get the following message


chown: changing ownership of mirror: Invalid argument

webappl said...

To durutti:
You do not need to run "sudo chown mpiuser /mirror" on client machines. For NFS sharing, ensure the user of accessing NFS shared files have same UID and GID on both server and client. You can use command "id username" to check UID and GID on all machines. Edit /etc/passwd and /etc/group when there is difference. For more information about NFS file permission, please refer to NFS documentations. Or you can also try NIS, LDAP central authentication.

SriVarma said...

hi!

really good post,

for a noob like me, I went through your install, and when I run mpiexec.hydra or mpiexec -np 12 ./cpi it executes all on one host

example:
mpiuser@ubuntu:/mirror$ mpiexec.hydra -np 12 ./cpi
Process 11 of 12 is on ubuntu
Process 1 of 12 is on ubuntu
Process 10 of 12 is on ubuntu
Process 3 of 12 is on ubuntu
Process 2 of 12 is on ubuntu
Process 0 of 12 is on ubuntu
...

pi is approximately 3.1415926544231252, Error is 0.0000000008333321
wall clock time = 0.004670
[example end]

any pointers?

thanks in advance

webappl said...

To SriVarma:
Have you completed step 7.3 "Set up Hydra Process Manager (which replaces MPD) on master node"?

Run command "source .bashrc" to take the change into effect after setting variable HYDRA_HOST_FILE in .bashrc.

Alternatively, you can also specify the host file using -hostfile option when calling mpiexec, eg:
mpiexec -hostfile ./hosts -np 16 ./cpi

Albert said...

Hi,
I am facing the similar kind of problem. My cpi program is running on single machine only.
Here is my sample output:

[root@beowulf ~]# mpiexec -n 4 /opt/mpich2-1.4.1p1/examples/./cpi
Process 2 of 4 is on beowulf.master
Process 3 of 4 is on beowulf.master
Process 1 of 4 is on beowulf.master
Process 0 of 4 is on beowulf.master
Fatal error in PMPI_Reduce: Other MPI error, error stack:
PMPI_Reduce(1270)...............: MPI_Reduce(sbuf=0xbfd69028, rbuf=0xbfd69020, count=1, MPI_DOUBLE, MPI_SUM, root=0, MPI_COMM_WORLD) failed

MPIR_Reduce_impl(1087)..........:
MPIR_Reduce_intra(895)..........:
MPIR_Reduce_binomial(144).......:
MPIDI_CH3U_Recvq_FDU_or_AEP(380): Communication error with rank 2
MPIR_Reduce_binomial(144).......:
MPIDI_CH3U_Recvq_FDU_or_AEP(380): Communication error with rank 1
^CCtrl-C caught... cleaning up processes
[root@beowulf ~]#

webappl said...

To Albert,
Sorry for the late reply because the blog has no alert regarding new comments. I hope you had solved the problem. How is the program running with one node, eg:
mpiexec -n 1 /opt/mpich2-1.4.1p1/examples/cpi

Can you please post your update here?

webappl said...

Thanks for those commented on my post. Unfortunately, I may not be able to regularily check for new comments and hence sorry for my late responses to you.

To new readers who would like to post a comment here: I will try to respond timely should you send me an email as well. You would be able to find out my email, wouldn't you?

Thanks.

srikanth ayalasomayajulu said...

Thank you for the detailed tutorial.
When i trying to setup, i successfully mounted the /mirror directory on the slave nodes. But when i trying to create a public key for generating passwordless ssh, the key is getting generated on the nodes( master n slaves) and what ever i do on master is getting reflected in slaves also. due to this i cannot able to do passwordless ssh.

webappl said...

To srikanth ayalasomayajulu,
As /mirror is the shared home directory for the same user on master and slave nodes, there is no need to transfer the public key file id_dsa.pub to slave nodes. Just add the public key generated from master node to file ~/.ssh/authorized_keys by running:
cat ~/.ssh/id_dsa.pub >>~/.ssh/authorized_keys

Alex

ChristianZ said...

Hi,

I got the following error messages when I try to run my program on a remote node:

[proxy:unset@ubuntu-server-vm] HYDT_bsci_init (./tools/bootstrap/src/bsci_init.c:180): unrecognized RMK: user
[proxy:unset@ubuntu-server-vm] HYD_pmcd_pmip_get_params (./pm/pmiserv/pmip_utils.c:747): proxy unable to initialize bootstrap server
[proxy:unset@ubuntu-server-vm] main (./pm/pmiserv/pmip.c:187): bad parameters passed to the proxy


Both, master and client node can SSH each other without password.
I am not running cpi but the simple Hello World program from https://help.ubuntu.com/community/MpichCluster

Can somebody help? Google returns nothing helpful on extracts of the error messages.

webappl said...

Kabel Deutschland Test,
Were you using rsh instead of ssh? I found the following post that may be helpful:
http://lists.mcs.anl.gov/pipermail/mpich-discuss/2011-February/009080.html

Unknown said...

Hi,

I need some help.

I'm having this error every time I use mpirun/mpiexec with host file

/mirror/mpich2-install/bin/hydra_pmi_proxy: error while loading shared libraries: libcr.so.0: cannot open shared object file: No such file or directory

I tried installing libcr-dev but it was already installed and still got the same problem.

Any idea what's the cause of the error? Thanks! :D

webappl said...

Mark Francis C. Flores,

Have you tried passing the path to libcr.so.0 when running mpiexec?
(1) Find libcr.so.0:
locate libcr.so.0
(2) Export libcr.so.o to environment variable LD_LIBRARY_PATH, eg:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/point/to/libcr.so.0/location
(3) Test a mpi program on a single node before calling multiple nodes:
mpiexec -n 1 ./a.out
(4) Then try running mpi program on multiple nodes.

Please make sure all nodes have the same libcr configuration/installtion

Unknown said...

webappl,

Thanks, it's now working. Actually, all the slave nodes weren't installed with libcr-dev. So I just had to install it.

Also, I'm looking for a benchmarking application/tool for MPICH2 but having a hard time finding that works perfectly. By any chance do you know any application that you would recommend that works properly and easy to install?

webappl said...

Mark Francis C. Flores,

I am glad you have solved the problem. For benchmark, I would suggest
the OSU benchmarks or the Intel MPI Benchmarks.

Unknown said...

Thanks a lot! I really appreciate your help! Hopefully we could get our thesis project done in no time! :D

Unknown said...

Hi again! One more question if it's alright with you, aside from benchmarking, are there any application you would suggest to use in running it on MPICH2 that works perfectly?

I found some applications like MM5 but had no luck on installing it since it has too many configurations which seemed to complicating.

Thanks! :D

webappl said...

Mark Francis C. Flores,

Sorry I can't help you about the MPI applications. It is best to practice your own MPI programming with regard to a parallel computing problem of interest.

JcA said...

Hello,
Thanks for this article. I followed the recommandation but I still have problems. I can compil but I can't execute on the nodes. I have this issue :


[proxy:0:1@raspberrypi] version_fn (./pm/pmiserv/pmip_utils.c:470): UI version string does not match proxy version
[proxy:0:1@raspberrypi] match_arg (./utils/args/args.c:115): match handler returned error
[proxy:0:1@raspberrypi] [mpiexec@w500] control_cb (./pm/pmiserv/pmiserv_cb.c:215): assert (!closed) failed
[mpiexec@w500] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec@w500] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:179): error waiting for event
[mpiexec@w500] main (./ui/mpich/mpiexec.c:397): process manager error waiting for completion

I don't know where to look. If I run locally on one node, it runs...

webappl said...

Hello JcA,

Can you post here your command to run your mpi program? Please make sure the executable file is accessible by all the computing nodes.

JcA said...

thanks webappl for your answer,

I try with :

mpiexec -np 10 -host raspberrypi ./a.out
or
mpirun -n 10 -host raspberrypi ./a.out
or
mpiexec.hydra -np 10 -host raspberrypi ./a.out

I have the shared folder /mirror with a mpiuser folder. This is the same on each nodes.

I've two nodes. The master (w500) and a "slave" (raspberrypi).

webappl said...

Hello JcA,
Have you installed multiple implementations of MPI libraries on the same system? I am afraid the error was due to use of different MPI libraries between the compilation and execution. If this is the case, the simplest solution is to retain only one MPI library and remove the others.

JcA said...

Thanks for answering webappl,

I install with Synaptic. If I look in the Synaptic Manager, these packages (which contains "mpi") are installed : mpich2, libmpich2-3, libmpich2-dev, libmpich1.0gf.

One of these packages could be a problem ?

I had installed qtoctave which use mpich-mdp-bin but I desinstalled it and the probleme persists.

webappl said...

JcA,
libmpich1.0gf is suspicious because it is for mpich1 rather than mpich2. You can try remove it and then compile and execute your MPI program again. If the problem is still there, remove mpich2 library and make a fresh install of mpich2 on all nodes:
sudo apt-get remove mpich2
sudo apt-get install mpich2

Please let me know whether this solution works for you.

JcA said...

Ok webappl,
I realize I had two different version. On my raspberry I have the 1.4.1 (with Debian) and on the w500 I have 1.4 (with Ubuntu 11.10). I suppose that could be a problem ? I'm upgrading Ubuntu and will try again. Then I follow your instruction about libmpich1.0gf.

I give you news a soon as possible.
Many thanks.

JcA said...

The problem is resolved if I use the same version on all nodes.

However I still have a question : I have different architecture. 32 and 64 bits. I try to compile on the master node (64 bits) with the option -m32 to have a 32bits application. Unfortunately it does'nt work. What is the best solution ? Is it possible to install both version of mpich2 on the same node ?

webappl said...

JcA,
Please read this discussion:
http://lists.mcs.anl.gov/pipermail/mpich-discuss/2011-February/009234.html

AtoMerZ said...

Hello,
Great article!
When I try to execute cpi program I get the following:

mpiuser@ubuntu3:~/mpich/mpich-3.0.1/examples$ mpiexec -f ~/mpich/hosts -n 16 ./cpi
Process 1 of 16 is on ubuntu3
Process 7 of 16 is on ubuntu3
Process 0 of 16 is on ubuntu2
Process 10 of 16 is on ubuntu2
Process 5 of 16 is on ubuntu3
Process 4 of 16 is on ubuntu2
Process 9 of 16 is on ubuntu3
Process 6 of 16 is on ubuntu2
Process 8 of 16 is on ubuntu2
Process 12 of 16 is on ubuntu2
Process 11 of 16 is on ubuntu3
Process 15 of 16 is on ubuntu3
Process 13 of 16 is on ubuntu3
Process 2 of 16 is on ubuntu2
Process 3 of 16 is on ubuntu3
Process 14 of 16 is on ubuntu2
Fatal error in PMPI_Reduce: A process has failed, error stack:
PMPI_Reduce(1217)...............: MPI_Reduce(sbuf=0xbff3fe28, rbuf=0xbff3fe30, count=1, MPI_DOUBLE, MPI_SUM, root=0, MPI_COMM_WORLD) failed
MPIR_Reduce_impl(1029)..........:
MPIR_Reduce_intra(779)..........:
MPIR_Reduce_impl(1029)..........:
MPIR_Reduce_intra(835)..........:
MPIR_Reduce_binomial(144).......:
MPIDI_CH3U_Recvq_FDU_or_AEP(612): Communication error with rank 1
MPIR_Reduce_intra(799)..........:
MPIR_Reduce_impl(1029)..........:
MPIR_Reduce_intra(835)..........:
MPIR_Reduce_binomial(206).......: Failure during collective

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 1
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================

Running it on a single node goes fine but when on different nodes I got problem. What can I do? This is the original problem:
http://stackoverflow.com/q/14571203/279982

webappl said...

To AtoMerZ

How about running executable by:

mpiexec.hydra -f ~/mpich/hosts -n 16 ./cpi

AtoMerZ said...

Same results. I've tried MPICH and OpenMPI.
One thing that might be worth mentioning is that I'm using 2 Virtual machines On a Windows 7.
Tried both bridged and NAT networking: under both circumstances SSH works fine. Both machines can ping, and I can mount my shared directory on client.