Skip title Accessibility statement: we seek to make the HPC web pages accessible to all users. If you encounter accessibility issues with HPC web pages please send a description of the problem by email to eric_sills@ncsu.edu - thank you. NC State
Office of Information Technology
High Performance Computing
Skip menu side bar
Home
About
 
OpNews
 
Help/Accounts
 
Partners
 
User Projects
Services
 
Hardware
 
Software
 
Grid
 
Monitor
HowTo/FAQ
 
Docs & Pubs
 
Courses
 
Other Resources

 Getting Started with NC State's Intel/IBM Linux Cluster at MCNC ...


  • Sam System Configuration
  • Logging onto the cluster
  • File Systems
  • Compiling
  • Portland Group Compilers
  • Running Jobs

    • Sam System Configuration

      There are approximately 1000 dual Xeon nodes. Each node has two Intel Irwindale Xeon processors, four GB of memory, and a 40 GB disk. There are nodes available for interactive code development. The compute nodes are managed by the LSF queuing system and are not for access except through LSF (accounts directly accessing a compute node are subject to immediate termination). Compute nodes are interconnected with gigabit Ethernet.

      Logins for the sam cluster are handled by the interactive nodes.

    • Logging onto the cluster SSH access is supported to the interactive nodes which are reached using hostname loginhpc.dcs.mcnc.org Authentication uses Unity IDs and passwords.

        Free SSH clients are available from various sources. Links to some commonly used versions are included here:
      • Windows
      • Unix, Linux

    • File Systems AFS files are not available from the cluster. Users have a home directory that is shared by all the sam cluster nodes. This is not the same home directory as is used on the BladeCenter Linux cluster (henry2).

      The /usr/local file system is also shared by all nodes. Each node currently has its own local /scratch file system that is available to all users and a shared scratch parallel file system /gpfs_share (this is also not the same file system available on henry2 cluster).

      Currently no backups are being performed on any file system mounted on the sam cluster!

    • Compiling

      There is currently one supported compiler, Portland Group, available for use on the sam cluster. While GNU compilers are installed their use is strongly discouraged and is not supported.

      • Portland Group Compilers To use the 64-bit Portland Group compilers it is necessary to properly configure some environment variables and paths. For csh/tcsh shell users a shortcut is available by using an alias which has been created - add.
        add pgi
        
        Will configure the environment to use the Portland Group compilers.

        Once these have been set the Portland Group compilers may be invoked with the pgcc, pgcpp, pgf77, pgf90, and pghpf commands for the C, C++, Fortran77, Fortran90, and High Performance Fortran compilers respectively.

        Parallel programs compiled with the Portland Group compilers should be linked with the Portland Group MPICH libaries.

        Having added the pgi envirnomment by

        add pgi
        
        the following command line line would compile an MPI Fortran 90 code with a high level of optimization:
        pgf90 -o exec -fastsse -Mmpi exec.f 
        

    • Running Jobs Access to the compute nodes is managed by LSF. All tasks for the compute nodes should be submitted to LSF.

      The following steps are used to submit jobs to LSF:

      • Create a script file containing the commands to be executed for your job:
        #BSUB -o standard_output
        #BSUB -e standard_error
        
        cp input /share/myuserid/input
        cd /share/myuserid
        ./job.exe < input
        cp output /home/myuserid
        
        
      • Use the bsub command to submit the script to the batch system. In the following example two hours of run time are requested:
        bsub -W 2:00 < script.csh
        
      • The bjobs command can be used to monitor the progress of a job
      • The -e and -o options specify the files for standard error and standard output respectively. If these are not specified the standard output and standard error will be sent by email to the account submitting the job.
      • The bpeek command can be used to view standard output and standard error for a running job.
      • The bkill command can be used to remove a job from LSF (regardless of current job status).

      For parallel jobs it is necessary for LSF to interface with the mpirun command to pass host and process information. To enable the LSF/MPI interface a script mpirun.lsf has been provided in the LSF bin directory. The following batch script will run a parallel job, note that the number of MPI tasks will match the number of processors requested from LSF.

      #BSUB -n 4
      #BSUB -W 60
      #BSUB -J job
      #BSUB -o standard_output.%J
      #BSUB -e standard_error.%J
      #BSUB -a mpichp4
      
      mpiexec ./parjob.exe
      

      Alternatively, replace the "mpiexec" line by

      mpirun.lsf /whateverthefullpathis/parjob.exe
      

      The #BSUB lines in the script pass options to the LSF bsub command. The -n option specifies the number of processors, -W specifies the run limit in minutes, -J provides a meaningful name for the job, -o specifies a file to hold standard output, -e specifies a file to hold standard error output, and -a mpichp4 identifies to LSF that the job will use MPI and the type of MPI being used.

      The script can be submitted to LSF for execution using the command:

      bsub < script.csh
      

      LSF writes some intermediate files in the user's home directory while the job is running. If the disk quota has been exceeded, then the batch job will fail, often without any meaningful error message.

Last modified: April 30 2009 08:54:21.
Office of Information Technology | NC State University | Raleigh, NC 27695 | Accessibility Statement | Policy Disclaimer | Contact Us