Introduction to Parallel Computing and the NCSU Linux Cluster
To be on the mailing listSend mail to
firstname.lastname@example.org the one line
subscribe hpcin the body.
This free short course will meet for 4 hours total (over 2 sessions) on the Thursday and Friday of spring break. Thursday, March 13, 9:30 to 11:30 AM, and Friday, March 14, 9:30 to 11:30 AM. The 2 session short course will be held in the ITTC Lab 2 in the D.H. Hill main NCSU library. It's a rather tortuous path from the front desk to the lab, but you can ask at the front desk. If you e-mail me (Gary Howell, email@example.com) in advance I can be sure there's enough space for you.
Graduate students, postdocs, faculty and staff who are likely to use parallel computation in research projects or theses are particularly invited. Before class starts, students who do not already have a Blade Center account are encouraged to have their advisors request them so they can have a permanent account. Faculty can request accounts for themselves and for their students online from http://www.ncsu.edu/itd/hpc/About/Contact.php
The NC State linux cluster is an IBM blade center with around ten thousand cores available for high performance computing. This short course introduces the use of the machines, starting with how to log on and submit jobs.
A focus is on how to compile and link to MPI (Message Passing Interface), the standard library for message passing parallel computation. Calls to MPI are embedded in Fortran, C, or C++ codes, enabling many processors to work together.
Session 1. How to log into the HPC machines and submit jobs. Why to use parallel computation. Some simple MPI commands and example programs. The last half of the time will be spent in getting an example code to run. A version of the lab is Lab 1
Session 2. MPI Collective communications. These can be simple and efficient. Considerations in efficient parallel computation. Running some more codes. The lab is Lab 2
Some additional materials online show how to use OpenMP to speed computations on multi-core computers. OpenMP parallelization is often fairly straightforward. OpenMP OpenMP2 OpenMP3
On the blade center, most blades have two motherboards. RAM is more easily accesible from one or the other of the motherboards (NUMA .. or Non Uniform Memory Access). For OpenMP to scale to use both motherboards effectively, some more advanced tricks are needed. See for example Tutorial from HPC2012 by Georg Hager, Gerhard Wellein, and Jan Treibig, University of Erlangen-Nuremberg, Germany.
Some previous courses introduce parallel debugging, profiling, and OpenMP (shared memory programming). See Previous Courses [Previous courses and links to class notes]