Using bcMPI
After logging into login.hpc.ncsu.edu, type
add bcmpi
at the command line. If you get the error, ": No such file or directory",
then instead type
source /home/gwhowell/scripts/bcmpi.html
Also create a .tcshrc file in your home directory and add the following line to it
setenv PATH /usr/bin:$PATH
This will allow 32 bit machines to use the 32 bit python required by bcMPI.
To run the example codes,
you can copy an example directory to your home space and make sure it works.
The examples show which MPI commands have been implemented to run from
within matlab. From you home directory, (unless you already have a
matlab directory in your home directory, in which case do this from
some other directory).
cp -R /usr/local/apps/bcmpi/gnu32/regress/matlab .
cd matlab
./create_job_files.sh
ls -lrt
Among the most recent files on your list should be param-regress.lsf
which can be used to submit the reg_* commands. param-regress.lsf
has a listing starting with
#!/bin/csh
#BSUB -n 4
#BSUB -W 00:05
#BSUB -R xeon
#BSUB -J param-regress
#BSUB -o /home/gwhowell/matlab/o.%J
#BSUB -e /home/gwhowell/matlab/e.%J
setenv INSTALL_DIR /usr/local/apps/bcmpi/ParaM-gnu32-mpich-p4-mexmpi
source ${INSTALL_DIR}/bin/matlab-environ.csh
setenv mpi_setup ${MPI_SETUP_BIN}
setenv mpiexec "${MPIEXEC_BIN} ${MPIEXEC_FLAGS}"
cd /home/gwhowell/matlab
${mpi_setup} reg_init matlab MATLAB_COMMAND=reg_init
${mpiexec} ./reg_init
where instead of gwhowell, you should see your own user name.
Before running a code be sure that the directory to which the
-o and -e lines are directed actually exists.
The sample jobs (the .m files in the directory) can be submitted to
run on the cluster by
bsub < param-regress.lsf
The standard output files o.xxxxxx and standard error files e.xxxxxx show
information about the job. Lines in the e.xxxxx file such as
could not open
could not open
Warning: No xauth data; using fake authentication data for X11 forwarding.
connect login02 port 6024: Connection refused
X connection to localhost:10.0 broken (explicit kill or server shutdown).
Warning: No xauth data; using fake authentication data for X11 forwarding.
connect login02 port 6024: Connection refused
X connection to localhost:10.0 broken (explicit kill or server shutdown).
Warning: No xauth data; using fake authentication data for X11 forwarding.
connect login02 port 6024: Connection refused
X connection to localhost:11.0 broken (explicit kill or server shutdown).
could not open
could not open
Warning: No xauth data; using fake authentication data for X11 forwarding.
connect login02 port 6024: Connection refused
are normal on the cluster.
Options in the ParaM.pjc file
The file param-regress.lsf can be edited by hand. For instance, you
might want to specify a longer execution time or the use of more
processors.
You can control which .m scripts are to be executed by editing the file
create_job_files.sh. To do so, edit the list of m_files
m_files='reg_init.m,'\
'reg_size_rank.m,'\
'reg_barrier.m,reg_hostinfo.m,'\
'reg_cell.m,reg_bcast.m,'\
'reg_broadcast.m,reg_relay.m,'\
'reg_send_recv.m,reg_buffer_send_recv.m,'\
'reg_probe.m,reg_reduce.m,'\
'reg_comm_functions.m'
The other file that controls parameters in the param-regress.lsf
script is ParaM.jpc, which has the contents
#
# ParaM.pjc - installation specific configuration file
#
# This file is read by batch file generator script.
#
[DEFAULT]
batch_shell = csh
batch_system = LSF
interpreter = matlab
machine_name = gnu32
#output_dir = /share/gwhowell/pmlab
walltime = 00:05:00
processes = 4
If you uncomment the line output_dir (and make up your own /share
directory as you don't have permission to write to mine) you can
avoid cluttering your matlab directory with the
standard error and output files.
To run .m files not on the list, you can copy them to this directory
and edit create_job_files.sh
or alternately copy ParaM.pjc and create_job_files.sh to the directory
containing the .m files.