Skip title Accessibility statement: we seek to make the HPC web pages accessible to all users. If you encounter accessibility issues with HPC web pages please send a description of the problem by email to eric_sills@ncsu.edu - thank you.

High Performance and Grid Computing
   
Skip menu side bar
Home
About

OpNews

Help/Accounts

Staff

Partners

User Projects


Services

Hardware

Software

Grid

Monitor


HowTo/FAQ

Docs & Pubs

Courses

Other Resources


  How do I use a Debugger on the HPC Machines?

  • A Typical Debug Session

    Typing

    >xclock &
    

    pops a little clock so I am getting an xterm (else look up how to get a GUI).
    >add gnu
    

    (so it can find the right mpirun to use, if you compiled with pgi, add pgi, if with intel, add intel)
    >add tv
    

    (so it knows totalview)
    >bsubtv -n 4 ./fooexec 
    

    then it pops 2 windows. If you can pop the windows, then you're in good shape to use the vendor user guide Total View User Guide. Also there is a "help" button. Two tutorials developed by national labs are NERSC Totalview Tutorial and LLNL Totalview Tutorial Those tutorials are more detailed than the following discussion.

    I start hitting the "next" button on the top window.

    When it says parallel, I say okay and keep hitting the next button (as your next step of refining your sessions, learn how to set a break point). Even in successful sessions, there are some warning messages here.

    In the back window there are 4 threads. On a couple of these hit the View button and "dive in new window"

    Then as I keep "nexting" the code progresses. If you want to follow the code into subroutines, use the "step" button instead of the "next". A number of variables are displayed, but you can also learn what other ones are. There is a built-in help menu that will help in learning to print variables, set break points, etc.

    It's often helpful to have another window open which has the code that's being stepped through. Once you know how to print variables in the GUIs, you find this is a lot faster than putting print statements in your code and recompiling.

    One reason you may not be able to see code or symbols is you need to compile with -g and no optimization flags. The rest of this document shows step by step how to debug (the exposition here is mostly command line, but the buttons on the GUI perform these same functions. Also for pgi compiled code, the following has links to using the pgdbg debugger, which some prefer to the Totalview debugger.)

  • What Does a Debugger Do?

    One way to debug Fortran or C code is to write print statements and recompile and rerun. For instance, if you have just changed a bit of code and want to make sure that the new code executes as you think, you might print variables to see if the code modifies them in the way you predict. Or if having added or changed a subroutine, you find that the code fails to execute correctly, you might put print statements at the start of the subroutine to verify that variables are passed correctly.

    Using a debugger allows you to accomplish these tasks without repeatedly recompiling. So if you've had to change hundreds of lines of code without good test cases for each few lines, and want to monitor the code behavior line by line, perhaps comparing to a known test case, using a debugger can be helpful. Learning to use a debugger may be useful either for your own future projects or in aiding colleagues.

    Stepping through programs. A debugger allows you to step through a Fortran or C program. At each step the program listing is displayed and before going on, you can check current values of program variables. Before starting program execution under the debugger, the user specifies one or more break points. On command, the program runs till the first break point. The user can then go on step by step or can set a new break point and ask the program to continue execution to the next break. All the debuggers described below can be used in this fashion.

    Examining core files. When code execution fails, a core file is created, typically called "core" or "core.jobnumber". The core file is in binary format so is not viewable with an editor. Some debuggers allow you to examine a core file to see what subroutine crashed (and at what line), what program called that routine (and so on through the whole stack). Also the user can print out values of program variables at each level of the stack. dbx on the p575 works well for examining core files. gdb and pgdbg on the Linux cluster do not allow examination of core files, Totalview?

  • Compiling Code So You Can Use a Debugger

    The program should be compiled with the -g flag, constructing a symbol table that allows a line by line stepping through the source code. Also turn off the -O2 optimizations and all other optimizations. Compiler optimizations are quite a nice set of tricks, but they usually work by rearranging the order of operations, so they make it hard for the debugger to correlate program lines with code execution.

  • What Debuggers are Available?

    On the Linux Blade Center, the Portland Group C and Fortran compilers work with the gdb and pgdbg debuggers. Totalview works with Intel as well as Portland group compiled codes, with gnu codes? pgdbg and Totalview are parallel debuggers. On the IBM p575, the IBM supplied debuggers work well with IBM xlf and xlc compilers. dbx is a good serial debugger and pdbx works well in parallel. dbx works well with core files.

  • Debuggers on the Linux BladeCenter

    On the Linux blade center, the gdb, pgdbg, and Totalview debuggers are available.
  • The GDB Debugger
  • The pgdbg Debugger
  • The Totalview Debugger

  • The GDB Debugger

    GDB is a classic open source program developed by Richard Stallman. It is limited to working only for serial codes and works via a command line interface (where some other available debuggers have GUIs and work for parallel debugging). But gdb is widely used so if you learn it you can use it elsewhere. And if you already are comfortable with gdb, it is available on henry2, working well with C and Fortran codes compiled with the PGI compiler.

    >info gdb

    gives a complete and fairly easy to follow set of instructions.

    The gdb debugger works well with pgi-compiled Fortran codes. Having compiled with the pgi Fortran compiler, the pgdbg debugger is also available. The pgdbg compiler is very similar to gdb and has more features, including a GUI interface and the capability to debug parallel codes. You may prefer gdb if you want your debugging skills to be portable.

    Having newly logged in,

    >add pgi

    adds the Portland group compiler environment. For debugging purposes, compile with the -g flag and no optimization (optimizing can confuse things by rearranging code execution order). For example,

    >pgf77 foo.f -g -o foo

    compiles foo.f to produce the executable file foo, where the -g preserves the symbol table in such a way that the debugger can step through the source code, listing the current code line. Typically at run time, one sets a break point, lets the code execute to that point, then steps it through a suspect section of code, observing variables to see where they go astray.

    >gdb foo

    starts a gdb session.

    Suppose that the know the code's problem is in SUBROUTINE FOOSUB. At the prompt one can enter,

    gdb>break foosub

    Then entering

    gdb>run

    will run the code till it enters SUBROUTINE FOOSUB.

    gdb>n

    will step through the code to the next executable line. 'n' (short for 'next') steps through an executable a line at a time, stepping past a subroutine or function call in one step. To step into a subroutine, use

    gdb>s

    (short for 'step'). If ivar is a variable inside foosub

    gdb> print ivar

    will display the current value of ivar. Suppose that A is a two dimensional matrix

    gdb> print a(2,3)@5

    would print a(2,3) and a total of five adjacent elements from memory, which in Fortran storage is the consecutive entries from a column. Alternately if A has leading dimension lda,

    gdb> print *(a+2*lda+3)@5

    would work. gdb does not seem to have a good way to print a section of a Fortran matrix row (in C matrix rows are stored consecutively, so gdb would easily display a matrix row). So a Fortran row would have to be displayed one print statement at a time (where in pgdbg you could use matlab notation to print a matrix row).

    Once you're stepping through foosub, and want to leap to a breakpoint at line 1142, you can set a new breakpoint.

    gdb>break 1142

    and jump to it by

    gdb> cont

    (provided your code would execute this line). where for gdb you would need to print these individually. One way to tell where to put the next breakpoint is by opening another xterm with an edit session of the source code. Find the line number you want (in vi, you would park the cursor on the line you want and ascertain its line number by typing :.= ), say 1311, then

    dbg> break 1311

    would put a break at that line. You can get out of the debugger by typing

    dbg> quit

  • The pgdbg Debugger The pgdbg debugger uses most of the conventional dbg debugger commands. For some on-line documentation, see The Portland group user guide The sample session for gdb will also work for pgdbg, where the session is initiated by

    >pgdbg foo

    Displaying a slice of a matrix is a bit easier. While the gdb notation still works, the easier column slice

    pgdbg> print a(2:6,3)

    and row slice

    pgdbg> print a(2,3:5)

    notations are also available. Pgdbg has man pages. Help is available from within the debugging sessions by typing

    pgdbg> help

    What goes wrong with gdb when debugging with the g77 and Intel ifc Fortran compilers? There are three fortran compilers available on the Linux cluster, the open source g77, the pgi Portland group compiler, and the Intel compiler. Unfortunately, recent versions of the g77 compiler and gdb do not work as well together as those available a few years ago. (I've been told that the older versions built g77 on top of an f2c conversion, allowing gdb to actually work on the gcc code.) Specifically, it is no longer possible to view the numeric values stored in variable dimension arrays. For ifc compiled code, one can step through codes, but most variables cannot be displayed.



  • The Totalview Debugger It works well with Intel ifc compiled codes and also works well in parallel. A Totalview tutorial is available at LANL Totalview Tutorial .

    For more information on running Totalview, see Totalview and GUIs

  • Debuggers on the IBM p575 To start the dbx debugger, produce an executable foo.exe by compiling it with the IBM Fortran or C compilers with the -g flag.

    >xlf90 -o foo.exe -g foo.f

    You can start a debug session by

    >dbx foo.exe

    Breakpoints are set by the name and line number of the file containing them.

    (dbx) stop at "foo.f":1169

    This will set a break at line 1169 of foo.f.

    The syntax

    (dbx) print a(1,2)

    is valid, but there seems to be no way to show a slice of a matrix. Another problem can be that though scalar variables print quickly, there can be a long delay in printing elements of a matrix.

    (dbx) cont

    continues execution of the code to the next breakpoint.

    One virtue of the dbx debugger is convenience of examining core files.

    Suppose that a -g compiled code foo runs and dumps a core

    > foo

    Segmentation fault - core dumped

    To investigate the error,

    >dbx foo

    Dbx reports the line where the dump occurred. You can examine the stack (what program called the subroutine and what program called that routine, and so on), and can print variables on each level of the stack.

    (dbx) up

    (dbx) down

    move up and down the stack respectively.

    (dbx) quit

    exits dbx.

    The p575 has a long man page for dbx which includes example sessions.



  • Parallel Debuggers on the Linux BladeCenter Totalview is the premier parallel debugger. We now have a permanent license.

    To run Totalview,
    >add tv
    >add intel
    >bsubtv -n 4 -W 15 -R xeon ./fooexec

    A GUI should pop showing code from the "main" program. Setting a break at the MPI_Init and then clicking on the "go" button will cause the code to break at MPI_Init. The GUI should then tell you the code is parallel and ask if you want to stop it. Probably you do want to stop it. The MPI_Init will start up the parallel processes. You can also pop GUIs for these and follow their progress step by step by clicking the Next button.

    The Totalview GUI does not support PGI and gnu compiled codes. Nevertheless it often also works for them (use "add pgi" or "add gnu", instead of "add intel").

    Our current implementation of totalview does not work on 64 bit codes (only on the default 32 bit versions).

    The Portland group pgdbg compiler works well for parallel codes and also has a nice GUI interface. It works on the Linux blade center for codes compiled with the PGI compilers. If you can get the GUI interface, it's easy to open a different window for each parallel process, so you can step through each of them. The syntax is similar to that of the serial pgdbg debugger.

  • Parallel Debuggers on the IBM p575 On the IBM p575 compiler, the pdbx command line compiler works in parallel.

    I typically start a parallel debugger from an interactive LSF session.

    >bsub -Is -n 4 -W 15 csh

    eventually returns an interactive shell giving access to 4 processors, Interactive shell sessions are limited to fifteen minutes.

    For the pdbx debugger,

    >pdbx fooexec -hostfile hostfile

    (where for this case hostfile consists of

    mcrae
    mcrae
    mcrae
    mcrae

    (four lines since we asked for 4 processors). To track different processors, make groups

    >group add first 0

    puts task 0 as group first.

    >group add rest 1:3

    puts the other processors in the task group "rest"

    >group list

    lists the groups. The default group is "all". To add a breakpoint at line 31 on first

    >on first stop at 31

    GUIs for debuggers.

    The pgdbg and Totalview parallel debuggers have convenient Graphical User Interfaces. These allow stepping through a code with a window for each process of interest. For details of how to pop a debugger GUI, see Debugger GUIs on the Blade Center.

  • Last modified: July 24 2006 09:56:53.
    Copyright © 2003-2007 by NC State University and others, All Rights Reserved.
    HPC & Grid (Version 1.4 / Site access count: 730141) - Site/Content Notice

    Site contact: Eric Sills, E-mail: eric_sills at ncsu dot edu , Tel: 919-513-0324, Fax: 919-513-1893, HPC and Grid Operations, Information Technology Division, Box 7109, North Carolina State University, Raleigh, NC27695-7914, USA