Skip title Accessibility statement: we seek to make the HPC web pages accessible to all users. If you encounter accessibility issues with HPC web pages please send a description of the problem by email to eric_sills@ncsu.edu - thank you.

High Performance and Grid Computing
   
Skip menu side bar
Home
About

OpNews

Help/Accounts

Staff

Partners

User Projects


Services

Hardware

Software

Grid

Monitor


HowTo/FAQ

Docs & Pubs

Courses

Other Resources

  Compiling and Running Jobs on the Power5 Shared Memory System at NC State ...

  • Logging In
  • Compiling
  • Compiling Fortran
  • Compiling C/C++
  • MPI
  • OpenMP
  • Compiling an OpenMP FORTRAN code
  • Compiling an OpenMP C code
  • Running Jobs
  • Single Processor LSF Jobs
  • Multiprocessor LSF Jobs

    The following instructions are preliminary.

  • Logging In

    The IBM power 5 login node can be accessed as p5login.hpc.ncsu.edu.

    Only SSH access is permitted for login sessions. File transfers to or from other systems to the p690 must use SFTP (or SCP) rather than FTP.

      Free SSH clients are available from various sources. Links to some commonly used versions are included here:
    • Windows
    • Unix, Linux

    Use of some applications running on the p5 Windows desktop may require an X-Window server to be installed on the desktop system. X-Win32 is licensed for NC State users. See the ITECS remote access page for additional information and to download X-Win32 (Unity ID is required).

    From a Linux or Unix box, a user would typically login by the command line

    >ssh -l fooname p5login.hpc.ncsu.edu
    

    Here the ">" is taken as the command line prompt, and the username "fooname" will typically be the user's Unity username, with the HPC account also inheriting the Unity password.

    To transfer a file "foofile" to p5login, a user could perform the following

    >sftp fooname@p5login.hpc.ncsu.edu
    (enter password to get prompt). 
    >put foofile
    >quit
    

    The login nodes run under AIX (IBM's version of Unix) , with a default tcsh shell. For information on how to use UNIX, see for example Unix Tutorial, and for some other references relevant to parallel computing languages and machines see References.

  • Compiling

    Use of the IBM compilers is strongly recommended. The GNU compilers are installed, however, these compilers produce significantly less efficient code than the IBM compilers.

    • Fortran
      The IBM Fortran compiler is invoked with the xlf command. There are also xlf90 and xlf95 commands available. All the commands invoke the same compiler, just with different parameters. The file /etc/xlf.cfg is used to determine the options used by each of the commands.

      The following command line is recommended as a starting point for compiling serial executables on the p690:

          xlf -03 -qstrict -qarch=pwr5 -qtune=pwr5 [your_source_file] 

      Note that default size for reals is 32 bits. Also, the default data segment size is small, less than one gigabyte. For programs that require more memory for data use the -bmaxdata option to specify the amount of memory required in bytes. The following would request 2 gigabytes (which is the maximum data size available without going to 64-bit addresses):

          xlf -O3 -qstrict -qarch=pwr5 -qtune=pwr5 -bmaxdata:0x80000000 [your_source_file] 

      Flags for the xlf compiler are documented in the man page accessible by:
          man xlf 

      The man page documentation is fairly terse. More complete IBM documentation of XL Fortran is available at XL Fortran User's Guide 9.1. The NERSC XLF Fortran web page gives a shorter introduction.

    • Compiling C/C++
      The command to invoke the IBM C compiler is xlc and to invoke the IBM C++ compiler is xlC. The following command line is recommended as a starting point for compiling serial executables on the p690:
         xlc -O3 -qstrict -qarch=pwr5 [your_source_file] 

      Flags for the xlc compiler are documented in the man page accessible by:
        man xlc 

      IBM documentation of C/C++ is available at Developing and Porting C and C++ Programs on AIX The NERSC IBM C/C++ web page gives a shorter introduction.

    • MPI Queues on the BladeCenter are expected to be shorter, so users may typically prefer the BladeCenter for for MPI based message passing codes.

      Compiling and linking with mpxlf enables MPI in Fortran codes. If "pmonte" is an executable compiled to run with whatever number of processors are assigned and the following file

       
      #! /bin/csh
      #BSUB -W 5
      #BSUB -n 4
      #BSUB -a poe
      ./pmonte
      #BSUB -o /scratch/foouser/pmonte.out.%J
      #BSUB -e /scratch/foouser/pmonte.err.%J
      #BSUB -J pmonte
      
      is saved as bscript, then if foouser types (from the directory where pmonte exists)
      bsub < bscript
      
      the parallel job pmonte is submitted to use 4 CPUs. Output will appear in /scratch/foouser.

      A difference from a bsub script used on the BladeCenter is the line

       
      #BSUB -a poe 
      
      which appears here. Also on the Power5, the name of the MPI parallel executable should not be preceded by "mpiexec" . Finally, /scratch is mounted on the Power5, but not on the BladeCenter.

    • OpenMP Users writing parallel codes for the Power5 shared memory system are encouraged to use the OpenMP library. OpenMP is a library of directives that can be inserted into Fortran or C/C++ codes to enable use of more than one thread. Inserting OpenMP directives does not prevent the code from running in serial, but enables the code to use more than one process, provided that each processor can access the same memory space. On the NCSU Power5 machines, up to 8 CPUs share memory.

      To run an OpenMP code, one sets an environmental variable for the number of processors the code will use. In tcsh,

      > setenv OMP_NUM_THREADS 4
      

      would allow a program to use 4 processors.

      An introduction to OpenMP can be found in Lectures 7-10 CSC_783. More extensive on-line references are NERSC OpenMP Tutorial and LLNL OpenMP Tutorial. A text with many Fortran examples is "Parallel Programming in OpenMP", by Chandra, Dagum, Kohr, Maydan, McDonald, and Menon. Published by Morgan Kaufmann, 2001. For a FAQ and some sample codes, see OpenMP FAQ. The following sections show how to compile and run a simple example code with OpenMP directives (first Fortran, then C)

    • Compiling an OpenMP FORTRAN code

      Consider the following code, "hello.f"

       
            use omp_lib
            print *, "Hello parallel world from threads:"
      ! parallel
            print *, omp_get_thread_num()
      ! end parallel
            print *, "Back to the sequential world"
            end
      

      A user who wanted Fortran 77 code could replace "use omp_lib" by "integer omp_get_thread" (but note that the OpenMP standard specifies Fortran 90, hence the Fortran90 command "use" should be defined whenever OpenMP is).

      The code was compiled by the command lines (taking > as the system prompt)

       
      >xlf -c -qsmp=omp hello.f
      >xlf hello.o -lxlsmp -o hello
      

      and having set the OMP_NUM_THREADS environmental variable by
      > setenv OMP_NUM_THREADS 2 
      

      can be executed on the head node by
      >./hello 
      

      returning the output
       
       Hello parallel world from threads:
       0
       1
       Back to the sequential world
      

      Note that the maximal number of processors available from the headnode is 4, that setting a larger number of threads than 4 will just cause multiple threads to run on a multiple processor. It is not polite to run large jobs on the headnode, and sysadmins routinely and without warning kill parallel jobs taking a significant amount of CPU time on the head node. Production jobs should be run in the LSF queue (allowing access to the two 8 CPU computational nodes).

      If the following lines are a file "bhello"

      #!/usr/bin/csh
      #BSUB -n 2
      #BSUB -e err.%J
      #BSUB -o out.%J
      #BSUB -W 5
      ./hello
      

      Then the shared memory hello job can be submitted to run under LSF by entering

      >bsub < bhello
      

      For more information on running jobs under LSF, see the section below Running Jobs. If the parallel job needs the MPI library, the bhello script needs a line #BSUB -a poe (see the example bsub script at MPI.)

    • Compiling an OpenMP C code

      Consider the following code, "hello.c"

       
      /* A hello world program */
      #include "omp.h"
      main( )      {
            printf("Hello parallel world from threads:
      ");
      #pragma omp parallel
            printf("%d  
      ", omp_get_thread_num());
            printf(" Back to the sequential world
      ");
      }
      
      The code was compiled by
       
      >xlc -c -qsmp=omp hello.c
      >xlc hello.o -lxlsmp -o hello
      

      and having set the OMP_NUM_THREADS environmental variable by
      > setenv OMP_NUM_THREADS 2 
      

      can be executed on the head node by
      >./hello 
      

      returning the output
       
       Hello parallel world from threads:
       0
       1
       Back to the sequential world
      

      It is not polite to run large jobs on the head node and the sys admins routinely and without warning delete such jobs. Production jobs should be run through the LSF queue. If the following lines are a file "bhello"

      #!/usr/bin/csh
      #BSUB -n 2
      #BSUB -e err.%J
      #BSUB -o out.%J
      #BSUB -W 5
      ./hello
      

      Then the hello job can be submitted to the LSF queue by entering

      >bsub < bhello
      

      The following section gives some more information on how to submit jobs to the LSF batch facility. If the parallel job needs the MPI library, the bhello script needs a line #BSUB -a poe (see the example bsub script at MPI.)

    • Running Jobs

      All parallel jobs and long serial jobs (more than about 15 minutes) should be submitted through the batch system, LSF.

      An initial set of queues have been enabled. These queues will be adjusted in response to usage patterns and user needs. The primary resources controled by the queues are number of processors and CPU time for the job.

      • Single Processor LSF Jobs
        • Create a script file containing the commands to be executed for your job
        • Use the bsub command to submit the script to the batch system. In the following example two hours of run time are requested:
          bsub -W 2:00 < script.csh
        • The bjobs command can be used to monitor the progress of a job
        • When the job completes any standard output or standard error generated by the job will be placed in the directory that was the current working directory when the bsub command was issued.
        • The bpeek command can be used to view standard output and standard error for a running job.
        • The bkill command can be used to remove a job from LSF (regardless of current job status).
      • Multiprocessor LSF Jobs
        • Follow the same proceedure as for single processor jobs. However, on the bsub command two additional arguments are required, -n number_of_tasks and -a poe:
           bsub -W 2:00 -n 8 -a poe < script.csh 
        • In this example two hours of run time and 8 tasks are requested. The maximal time that can be requested is -W 18:00 (18 hours).
        For more information on LSF, try the man pages, e.g.,
        >man bsub
        There are useful LSF tutorials on-line (declining to give specific links here as implementations are not completely compatible with each other or ours).

  • Copyright © 2003-2007 by NC State University and others, All Rights Reserved.
    HPC & Grid (Version 1.4 / Site access count: 754043) - Site/Content Notice

    Site contact: Eric Sills, E-mail: eric_sills at ncsu dot edu , Tel: 919-513-0324, Fax: 919-513-1893, HPC and Grid Operations, Information Technology Division, Box 7109, North Carolina State University, Raleigh, NC27695-7914, USA