Use of the IBM compilers is strongly recommended.
The GNU compilers are installed, however, these
compilers produce significantly less efficient
code than the IBM compilers.
- Fortran
The IBM Fortran compiler is invoked with the
xlf command. There are also xlf90
and xlf95 commands available. All the
commands invoke the same compiler, just with different
parameters. The file /etc/xlf.cfg is used
to determine the options used by each of the commands.
The following command line is recommended as a
starting point for compiling serial executables
on the p690:
xlf -03 -qstrict -qarch=pwr5 -qtune=pwr5 [your_source_file]
Note that default size for reals is 32 bits. Also,
the default data segment size is small, less than one
gigabyte. For programs that require more memory for data
use the -bmaxdata option to specify the amount
of memory required in bytes. The following would request
2 gigabytes (which is the maximum data size available
without going to 64-bit addresses):
xlf -O3 -qstrict -qarch=pwr5 -qtune=pwr5 -bmaxdata:0x80000000 [your_source_file]
Flags for the xlf compiler are documented in the man page
accessible by:
man xlf
The man page documentation is fairly terse. More complete
IBM documentation of XL Fortran is available at XL Fortran User's Guide 9.1.
The NERSC
XLF Fortran web page gives a shorter introduction.
- Compiling C/C++
The command to invoke the IBM C compiler is xlc and
to invoke the IBM C++ compiler is xlC. The following
command line is recommended as a starting point for
compiling serial executables on the p690:
xlc -O3 -qstrict -qarch=pwr5 [your_source_file]
Flags for the xlc compiler are documented in the man page
accessible by:
man xlc
IBM documentation of C/C++ is available at
Developing and Porting C and C++ Programs on AIX
The NERSC
IBM C/C++ web page gives a shorter introduction.
- MPI
Queues on the BladeCenter are expected to be shorter, so
users may typically prefer the BladeCenter for for MPI based
message passing codes.
Compiling and linking with mpxlf enables MPI in Fortran codes.
If "pmonte" is an executable compiled to run with whatever number
of processors are assigned and the following file
#! /bin/csh
#BSUB -W 5
#BSUB -n 4
#BSUB -a poe
./pmonte
#BSUB -o /scratch/foouser/pmonte.out.%J
#BSUB -e /scratch/foouser/pmonte.err.%J
#BSUB -J pmonte
is saved as bscript, then if foouser types (from the directory
where pmonte exists)
bsub < bscript
the parallel job pmonte is submitted to use 4 CPUs.
Output will appear in /scratch/foouser.
A difference from a bsub script used on
the BladeCenter is the line
#BSUB -a poe
which appears here. Also on the Power5, the name of the MPI
parallel executable should not be preceded by "mpiexec" .
Finally, /scratch is mounted on the Power5, but not on the
BladeCenter.
- OpenMP
Users writing parallel codes for the Power5 shared memory system are
encouraged to use the OpenMP library. OpenMP is a library of
directives that can be inserted into Fortran or C/C++ codes to
enable use of more than one thread. Inserting OpenMP directives
does not prevent the code from running in serial, but enables
the code to use more than one process, provided that each processor
can access the same memory space. On the NCSU Power5 machines,
up to 8 CPUs share memory.
To run an OpenMP code, one sets an environmental variable for the
number of processors the code will use. In tcsh,
> setenv OMP_NUM_THREADS 4
would allow a program to use 4 processors.
An introduction to OpenMP can be found in
Lectures 7-10 CSC_783. More extensive on-line references are
NERSC
OpenMP Tutorial and
LLNL OpenMP Tutorial.
A text with many Fortran examples is "Parallel Programming in OpenMP", by
Chandra, Dagum, Kohr, Maydan, McDonald, and Menon. Published by
Morgan Kaufmann, 2001. For a FAQ and some sample codes, see
OpenMP FAQ.
The following sections show how to compile and run a simple example code
with OpenMP directives (first Fortran, then C)
- Compiling an OpenMP FORTRAN code
Consider the following code, "hello.f"
use omp_lib
print *, "Hello parallel world from threads:"
! parallel
print *, omp_get_thread_num()
! end parallel
print *, "Back to the sequential world"
end
A user who wanted Fortran 77 code could replace "use omp_lib" by
"integer omp_get_thread" (but note that the OpenMP standard
specifies Fortran 90, hence the Fortran90 command "use" should be defined whenever OpenMP is).
The code was compiled by the command lines (taking > as the system
prompt)
>xlf -c -qsmp=omp hello.f
>xlf hello.o -lxlsmp -o hello
and having set the OMP_NUM_THREADS environmental variable by
> setenv OMP_NUM_THREADS 2
can be executed on the head node by
>./hello
returning the output
Hello parallel world from threads:
0
1
Back to the sequential world
Note that the maximal number of processors available from the headnode
is 4, that setting a larger number of threads than 4 will just cause
multiple threads to run on a multiple processor. It is not
polite to run large jobs on the headnode, and sysadmins routinely
and without warning kill parallel jobs taking a significant
amount of CPU time on the head node. Production jobs should
be run in the LSF queue (allowing access to the two 8 CPU
computational nodes).
If the following lines are a file "bhello"
#!/usr/bin/csh
#BSUB -n 2
#BSUB -e err.%J
#BSUB -o out.%J
#BSUB -W 5
./hello
Then the shared memory hello job can be submitted to run under LSF by
entering
>bsub < bhello
For more information on running jobs under LSF, see the section below
Running Jobs.
If the parallel job
needs the MPI library, the bhello script needs a line #BSUB -a poe
(see the example bsub script at MPI.)
- Compiling an OpenMP C code
Consider the following code, "hello.c"
/* A hello world program */
#include "omp.h"
main( ) {
printf("Hello parallel world from threads:
");
#pragma omp parallel
printf("%d
", omp_get_thread_num());
printf(" Back to the sequential world
");
}
The code was compiled by
>xlc -c -qsmp=omp hello.c
>xlc hello.o -lxlsmp -o hello
and having set the OMP_NUM_THREADS environmental variable by
> setenv OMP_NUM_THREADS 2
can be executed on the head node by
>./hello
returning the output
Hello parallel world from threads:
0
1
Back to the sequential world
It is not polite to run large jobs on the head node and the sys admins routinely
and without warning delete such jobs. Production jobs should be run through
the LSF queue. If the following lines are a file "bhello"
#!/usr/bin/csh
#BSUB -n 2
#BSUB -e err.%J
#BSUB -o out.%J
#BSUB -W 5
./hello
Then the hello job can be submitted to the LSF queue by
entering
>bsub < bhello
The following section gives some more information on how
to submit jobs to the LSF batch facility.
If the parallel job
needs the MPI library, the bhello script needs a line #BSUB -a poe
(see the example bsub script at MPI.)
- Running Jobs
All parallel jobs and long serial jobs (more than
about 15 minutes) should be submitted through
the batch system, LSF.
An initial set of queues have been enabled. These
queues will be adjusted in response to
usage patterns and user needs. The primary
resources controled by the queues are number
of processors and CPU time for the job.
- Single Processor LSF Jobs
- Multiprocessor LSF Jobs
For more information on LSF, try the man pages, e.g.,
>man bsub
There are useful LSF tutorials on-line (declining to
give specific links here as implementations are
not completely compatible with each other or ours).