Skip title Accessibility statement: we seek to make the HPC web pages accessible to all users. If you encounter accessibility issues with HPC web pages please send a description of the problem by email to eric_sills@ncsu.edu - thank you. NC State
Office of Information Technology
High Performance Computing
Skip menu side bar
Home
About
 
OpNews
 
Help/Accounts
 
Partners
 
User Projects
Services
 
Hardware
 
Software
 
Grid
 
Monitor
HowTo/FAQ
 
Docs & Pubs
 
Courses
 
Other Resources

 Getting Started with the IBM BladeCenter Linux Cluster (henry2) at NC State ...



  • Henry2 System Configuration

    There are more than 782 dual Xeon compute nodes in the henry2 cluster. Each node has two Xeon processors (mix of single-, dual-, and quad-core), two GigaBytes of memory per core, and a 36 - 73 GigaByte disk. There are additional nodes available for code development and debugging.

    There are also a mix of 32-bit and 64-bit nodes. Generally, 32-bit executables will run correctly on either type node while 64-bit executables must be run on 64-bit nodes. 64-bit executables are required in order to access more than about 3GB of memory for program data.

    The BladeCenter compute nodes are managed by the LSF resource manager and are not for access except through LSF (accounts directly accessing compute nodes are subject to immediate termination).

    Logins for the cluster are handled by a set of login nodes which can be accessed as login.hpc.ncsu.edu using ssh for 32-bit login nodes and login64.hpc.ncsu.edu using ssh for 64-bit login nodes.

    Additional information on the university Linux cluster configuration is available in http://hpc.ncsu.edu/Documents/hpc_cluster_config.pdf

  • Logging onto the cluster

    SSH access is supported to the login nodes (login.hpc.ncsu.edu and login64.hpc.ncsu.edu). Logins are authenticated using Unity user names and passwords. NC State windows users can obtain ssh clients from ITECS remote access page. Also, Windows X11 server for Windows is available from the same ITECS site for users with Unity IDs.

    Login nodes should not be used for interactive jobs that take any significant fraction of system resources. The usual way to run CPU intensive codes is to submit them as batch jobs to LSF, which schedules them for execution on computational nodes. Example LSF job submission files can be found in Intel Compilers.

    Nevertheless, it is sometimes necessary to use interactive GUI based serial pre and post processors for data resident in the HPC environment. Interactive computing in the HPC environment should be performed by requesting a VCL HPC service. To request a VCL node with the HPC environment, go to the web page http://vcl.ncsu.edu.

    Click on "Make a VCL Reservation"

    From the list of environments, select "HPC(Redhat Linux)" for 32-bit environment or select "HPC(64-bit RedHat Linux)" for 64-bit environment.

    When a node is available, you will receive a message detailing how to log in. You can have exclusive use of the node for four hours (actually can be extended a few hours if the system is not busy). If you have an HPC account, but have problems getting an HPC VCL node, send e-mail to gary_howell@ncsu.edu.

  • File Systems

    AFS files are not available from the cluster (but are available on the VCL HPC environments described above).

    Users have a home directory that is shared by all the cluster nodes. Also, the /usr/local file system is shared by all nodes. Home file system is backed up daily, with one copy of each file retained.

    Each node currently has its own local /scratch file system that is available to all users. Two shared scratch file systems /share and /share3 are also available to all users. These file systems are not backed up and files may be deleted from the file systems automatically at any time, use of these file systems is at the users own risk.

    A parallel file system /gpfs_share is also available. Directories on /gpfs_share can be requested. There is a 1TB group quota imposed on /gpfs_share. /gpfs_share file system is not backed up. Use is at the users own risk.

    An HPC Storage Partner Program provides faculty the option of purchasing additional storage to directly connect to NC State HPC resources.

    Finally, from the login nodes the HPC mass storage system, /ncsu/volume1 and /ncsu/volume2, is available for storage in excess of what can be accomodated in /home and these file systems are also available from other NC State HPC login nodes (e.g. from the POWER5 shared memory system login node).

    User files in /home, /ncsu/volume1, and /ncsu/volume2 are backed up daily. A single backup version is maintained for each file. User files in all other file systems are not backed up.

    Important files should never be placed on storage that is not backed up unless another copy of the file exists in another location.

    HPC projects are allocated 100GB of storage in one of the hpc mass storage systems (volume1 or volume2). Additional backed up space in these file systems can be purchased or leased.

    Additional information about storage on HPC resources is available from http://hpc.ncsu.edu/Documents/GettingStartedstorage.php

  • Compiling

    There are three compiler flavors available on the cluster: 1) the standard GNU compilers supplied with Linux, 2) the Intel compilers, and 3) the Portland Group compilers.

    The default GNU compilers are okay for compiling utility programs but in most cases are not appropriate for computationally intensive applications. Overall the best performance has been observed using the Intel compilers. The Portland Group compilers tend to be somewhat less syntacticly strict and also provide somewhat better debugging capabilities.

    Additional information about use of each of these compilers is available from the following links. Generally objects and libraries built with different compiler flavors should not be mixed as unexpected behavior may result.

    Programs with memory requirements of more than ~1GB should review the following information.
    A note on compiling executables with large (> ~1 GB) memory requirements

    Also, programs with memory requirements of more than ~3GB are not supported on the 32-bit Xeon architecture used on most of the cluster nodes. A number of 64-bit Xeon EM64T nodes are available - along with 64-bit login nodes (login64.hpc.ncsu.edu). These nodes can support codes with larger memory requirements, however, the physical memory installed on the nodes is two gigabytes per core.

  • Running Jobs

    The Blade Center is designed to run computationally intensive jobs on compute nodes. Running jobs on the head node is possible, but if several users run computationally intensive jobs on the head node at one time, then the node can stall and require rebooting.

    Please limit your use of the head node to editing and compiling, and transferring files. Running more than one file transfer program (scp, sftp, cp) from the head node at a time is also not desirable.

    To run computationally intensive jobs on the blade center use the compute nodes. Access to the compute nodes is managed by LSF. All tasks for the compute nodes should be submitted to LSF.

    The following steps are used to submit jobs to LSF:

    • Create a script file containing the commands to be executed for your job:
      #BSUB -o standard_output
      #BSUB -e standard_error
      
      cp input /share/myuserid/input
      cd /share/myuserid
      ./job.exe < input
      cp output /home/myuserid
      
      
    • Use the bsub command to submit the script to the batch system. In the following example two hours of run time are requested:
      bsub -W 2:00 < script.csh
      
    • The bjobs command can be used to monitor the progress of a job
    • The -e and -o options specify the files for standard error and standard output respectively. If these are not specified the standard output and standard error will be sent by email to the account submitting the job.
    • The bpeek command can be used to view standard output and standard error for a running job.
    • The bkill command can be used to remove a job from LSF (regardless of current job status).

    For parallel jobs it is necessary for LSF to interface with the mpirun command to pass host information. To simplify this process an interface script mpiexec has been provided in the LSF bin directory. The following batch script will run a parallel job, note that the number of tasks will match the number of processors requested from LSF. The path set when bsub is invoked must include the appropriate mpirun command.

    #! /bin/csh
    #BSUB -o standard_output
    #BSUB -e standard_error
    
    mpiexec ./parjob.exe
    

    To submit a multi-core job use the -n 1 -x option to specify exclusive use of a processor. An example submission file might be

    #! /bin/csh 
    #BSUB -o out.%J
    #BSUB -e err.%J
    #BSUB -n 1 -x 
    #BSUB -W 15
    #BSUB -q shared_memory 
    setenv OMP_NUM_THREADS 16
    ./exec
    

    If the above file is btry, it could be submitted by the command

     
    bsub < btry 
    
    This job runs only for 15 minutes due to the #BSUB -W 15 flag. Once a user is satisfied a job is running well, more time will typically be requested. Partners who have bought blades in the last few years may elect to run on dual core dual motherboard blades by using a #BSUB -R dc line along with #BSUB -q mypartnerqueue and a request for only 4 openMP threads. The revised job submission file would be

    #! /bin/csh 
    #BSUB -o out.%J
    #BSUB -e err.%J
    #BSUB -n 1 -x 
    #BSUB -R dc 
    #BSUB -W 15
    #BSUB -q "mypartnerqueue"
    setenv OMP_NUM_THREADS 4
    ./exec
    

    btry revised to run a parallel job becomes

    #! /bin/csh 
    #BSUB -o out.%J
    #BSUB -e err.%J
    #BSUB -n 4  
    #BSUB -W 15
    #BSUB -q shared_memory 
    mpiexec ./exec
    

    Because the job asks for 4 or fewer processors and less than 15 minutes of memory, it goes into the high priority debug queue, so that turnaround is fast and mistakes can be quickly corrected.

    There are a number of queues currently configured. In general the best queue will be selected automatically without the user specifing a queue to the bsub command. In some cases LSF may override user queue choices and assign jobs to a more appropriate queue.

    There is a queue that will schedule jobs on any of the blades and accepts jobs using up to 64 processors. The serial job queue will schedule jobs only on selected blades. The single_chassis queue will schedule jobs only on blades that are located within the same chassis. Each chassis holds 14 blades. As many of the non-partner blades are on 28 processor (2 processor per blade) chassis, the single_chassis queue is limited to a maximum of 28 processors.

    A note on LSF job scheduling

    LSF writes some intermediate files in the user's home directory while the job is running. If the disk quota has been exceeded, then the batch job will fail, often without any meaningful error message.


Office of Information Technology | NC State University | Raleigh, NC 27695 | Accessibility Statement | Policy Disclaimer | Contact Us