|  
|
  |
|
How do I use a Debugger on the HPC Machines?
A Typical Debug Session
Typing
>xclock &
pops a little clock so I am getting an xterm
(else look up how to get a GUI).
>add gnu
(so it can find the right mpirun to use, if you compiled with
pgi, add pgi, if with intel, add intel)
>add tv
(so it knows totalview)
>bsubtv -n 4 ./fooexec
then it pops 2 windows.
If you can pop the windows, then you're in good shape to use
the vendor user guide Total View User Guide. Also there is a "help"
button. Two tutorials developed by national labs are NERSC Totalview Tutorial
and LLNL Totalview Tutorial
Those tutorials are more detailed than the following discussion.
I start hitting the "next" button on the top window.
When it says parallel, I say okay and keep
hitting the next button (as your next step of
refining your sessions, learn how to set a break
point). Even in successful sessions, there are
some warning messages here.
In the back window there are 4 threads. On a couple of these
hit the View button and "dive in new window"
Then as I keep "nexting" the code progresses.
If you want to follow the code into subroutines, use
the "step" button instead of the "next".
A number of variables are displayed, but you
can also learn what other ones are.
There is a built-in help menu that will help in learning
to print variables, set break points, etc.
It's often helpful to have another window
open which has the code that's being stepped through.
Once you know how to print variables in the GUIs,
you find this is a lot faster than putting
print statements in your code and recompiling.
One reason you may not be able to see code or symbols
is you need to compile with -g and no optimization
flags. The rest of this document shows step by
step how to debug (the exposition here is mostly command line, but
the buttons on the GUI perform these same
functions. Also for pgi compiled code, the following has links to using
the pgdbg debugger, which some prefer to the
Totalview debugger.)
What Does a Debugger Do?
One way to debug Fortran or C code is to write print statements
and recompile and
rerun. For instance, if you have just changed a bit of code
and want to make sure that the new code executes as you think, you might
print variables to see if the code modifies them in the way you predict.
Or if having added or changed a subroutine, you find that the code fails to
execute correctly, you might put print statements at the start of
the subroutine to verify that variables are passed correctly.
Using a debugger allows you to accomplish these tasks without
repeatedly recompiling. So if you've had to change hundreds of
lines of code without good test cases for each few lines,
and want to monitor the code behavior line by line, perhaps
comparing to a known test case, using a debugger
can be helpful. Learning to use a debugger may be useful
either for your own future projects or in aiding colleagues.
Stepping through programs.
A debugger allows you to step through a Fortran or C program.
At each step the program listing is displayed and before going on,
you can check current values of program variables. Before starting
program execution under the debugger, the user specifies one or
more break points. On command,
the program runs till the first break point. The user can then go on step by
step or can set a new break point and ask the program to continue
execution to the next break. All the debuggers
described below can be used in this fashion.
Examining core files.
When code execution fails, a core file is created, typically called
"core" or "core.jobnumber".
The core file is in binary format so is not viewable with an editor.
Some debuggers allow you to examine a core file to see what subroutine crashed
(and at what line),
what program called that routine (and so on through the whole stack).
Also the user can print out values of program variables at each
level of the stack. dbx on the p575 works well for examining core files.
gdb and pgdbg on the Linux cluster do not allow examination of core files,
Totalview?
Compiling Code So You Can Use a Debugger
The program should be compiled with
the -g flag, constructing a symbol table that allows a line by line
stepping through the source code. Also
turn off the -O2 optimizations and all other optimizations. Compiler optimizations are quite a nice set of tricks, but they usually work by rearranging the order of operations, so they make it hard for the debugger to correlate program lines with code execution.
What Debuggers are Available?
On the Linux Blade Center, the Portland Group C and Fortran compilers work with
the gdb and pgdbg debuggers.
Totalview works with Intel as well as Portland group compiled codes, with
gnu codes? pgdbg and Totalview are parallel debuggers.
On the IBM p575, the IBM supplied debuggers work well with IBM xlf and xlc
compilers. dbx is a good serial debugger and pdbx works well in parallel.
dbx works well with core files.
Debuggers on the Linux BladeCenter
On the Linux blade center, the gdb, pgdbg, and Totalview debuggers are available.
The GDB Debugger
The pgdbg Debugger
The Totalview Debugger
The GDB Debugger
GDB is a classic open source program developed by Richard
Stallman. It is limited to working only for serial codes
and works via a command line interface (where some other
available debuggers have GUIs and work for parallel debugging).
But gdb is widely used so if you learn it you can use it elsewhere.
And if you already are comfortable with gdb, it is available on henry2,
working well with C and Fortran codes compiled with the PGI
compiler.
>info gdb
gives a complete and fairly easy to follow set of instructions.
The gdb debugger works well with pgi-compiled Fortran codes.
Having compiled with the pgi Fortran compiler, the pgdbg debugger is also
available. The pgdbg compiler is very similar to gdb and has more
features, including a GUI interface and the capability to
debug parallel codes. You may prefer gdb if you want
your debugging skills to be portable.
Having newly logged in,
>add pgi
adds the Portland group compiler environment. For debugging
purposes, compile with the -g flag and no optimization (optimizing can
confuse things by rearranging code execution order). For example,
>pgf77 foo.f -g -o foo
compiles foo.f to produce the executable file foo, where the -g preserves
the symbol table in such a way that the debugger can step through the
source code, listing the current code line. Typically at run time,
one sets a break point, lets the code execute
to that point, then steps it through a suspect section of code, observing
variables to see where they go astray.
>gdb foo
starts a gdb session.
Suppose that the know the code's problem is in SUBROUTINE FOOSUB.
At the prompt one can enter,
gdb>break foosub
Then entering
gdb>run
will run the code till it enters SUBROUTINE FOOSUB.
gdb>n
will step through the code to the next executable line.
'n' (short for 'next')
steps through an executable a line at a time, stepping past
a subroutine or function call in one step. To step into a subroutine,
use
gdb>s
(short for 'step'). If ivar is a variable inside foosub
gdb> print ivar
will display the current value of ivar. Suppose that A is a two
dimensional matrix
gdb> print a(2,3)@5
would print a(2,3) and a total of five adjacent elements from
memory, which in Fortran storage is the consecutive entries from a
column. Alternately if A has leading dimension lda,
gdb> print *(a+2*lda+3)@5
would work. gdb does not seem to have a good way to print a section of
a Fortran matrix row (in C matrix rows are stored consecutively, so gdb
would easily display a matrix row). So a Fortran row would have to be
displayed one print statement at a time (where in pgdbg you could use
matlab notation to print a matrix row).
Once you're stepping through foosub, and want to leap to a breakpoint at line
1142, you can set a new breakpoint.
gdb>break 1142
and jump to it by
gdb> cont
(provided your code would execute this line).
where for gdb you would need to print these individually.
One way to tell where to put the next breakpoint
is by opening another xterm with an edit session of the source
code. Find the line number
you want (in vi, you would park the cursor on the line you want
and ascertain its line number by typing :.= ), say 1311, then
dbg> break 1311
would put a break at that line. You can get out of the debugger by
typing
dbg> quit
The pgdbg Debugger
The pgdbg debugger uses most of the conventional dbg debugger commands.
For some on-line documentation, see The Portland group user guide
The sample session for gdb will also work for pgdbg, where
the session is initiated by
>pgdbg foo
Displaying a slice of a matrix is a bit easier. While the gdb
notation still works, the easier column slice
pgdbg> print a(2:6,3)
and row slice
pgdbg> print a(2,3:5)
notations are also available.
Pgdbg has man pages. Help is available from within the
debugging sessions by typing
pgdbg> help
What goes wrong with gdb when debugging with the g77 and Intel
ifc Fortran compilers?
There are three fortran compilers available on
the Linux cluster, the open source g77, the pgi Portland group compiler,
and the Intel compiler.
Unfortunately, recent versions of the g77 compiler and gdb do not work
as well together as those available a few years ago.
(I've been told that the older versions built g77 on top of an f2c conversion,
allowing gdb to actually work on the gcc code.)
Specifically, it is no longer possible to view
the numeric values stored in variable dimension arrays.
For ifc compiled code,
one can step through codes, but most variables cannot be displayed.
The Totalview Debugger
It works well with Intel ifc compiled codes and also works well
in parallel. A Totalview tutorial is available at LANL Totalview
Tutorial .
For more information on running Totalview, see Totalview and
GUIs
Debuggers on the IBM p575
To start the dbx debugger, produce an executable foo.exe by compiling
it with the IBM Fortran or C compilers with the -g flag.
>xlf90 -o foo.exe -g foo.f
You can start a debug session by
>dbx foo.exe
Breakpoints are set by the name and line number of the file containing them.
(dbx) stop at "foo.f":1169
This will set a break at line 1169 of foo.f.
The syntax
(dbx) print a(1,2)
is valid, but there seems to be no way to show a slice of a matrix.
Another problem can be that though scalar variables print quickly, there
can be a long delay in printing elements of a matrix.
(dbx) cont
continues execution of the code to the next breakpoint.
One virtue of the dbx debugger is convenience of examining core files.
Suppose that a -g compiled code foo runs and dumps a core
> foo
Segmentation fault - core dumped
To investigate the error,
>dbx foo
Dbx reports the line where the dump occurred.
You can examine the stack (what program called the subroutine
and what program called that routine, and so on), and can print
variables on each level of the stack.
(dbx) up
(dbx) down
move up and down the stack respectively.
(dbx) quit
exits dbx.
The p575 has a long man page for dbx which includes example sessions.
Parallel Debuggers on the Linux BladeCenter
Totalview is the premier parallel debugger. We now have a permanent license.
To run Totalview,
>add tv
>add intel
>bsubtv -n 4 -W 15 -R xeon ./fooexec
A GUI should pop showing code from the "main" program. Setting
a break at the MPI_Init and then clicking on the "go" button
will cause the code to break at MPI_Init. The GUI should then
tell you the code is parallel and ask if you want to stop it.
Probably you do want to stop it. The MPI_Init will start
up the parallel processes. You can also pop GUIs for these
and follow their progress step by step by clicking the
Next button. The Totalview GUI does not support PGI and
gnu compiled codes. Nevertheless it often also works for them
(use "add pgi" or "add gnu", instead of "add intel").
Our current implementation of totalview does not work on 64 bit
codes (only on the default 32 bit versions).
The Portland group pgdbg compiler works well for parallel codes and
also has a nice GUI interface. It works on the Linux blade center
for codes compiled with the PGI compilers. If you can get the
GUI interface, it's easy to open a different window for each
parallel process, so you can step through each of them. The syntax
is similar to that of the serial pgdbg debugger.
Parallel Debuggers on the IBM p575
On the IBM p575 compiler, the pdbx command line compiler works in
parallel.
I typically start a parallel debugger from an interactive LSF session.
>bsub -Is -n 4 -W 15 csh
eventually returns an interactive shell giving access to 4 processors,
Interactive shell sessions are limited to fifteen minutes.
For the pdbx debugger,
>pdbx fooexec -hostfile hostfile
(where for this case hostfile consists of
mcrae
mcrae
mcrae
mcrae
(four lines since we asked for 4 processors).
To track different processors, make groups
>group add first 0
puts task 0 as group first.
>group add rest 1:3
puts the other processors in the task group "rest"
>group list
lists the groups. The default group is "all". To add a breakpoint
at line 31 on first
>on first stop at 31
GUIs for debuggers.
The pgdbg and Totalview parallel debuggers have convenient Graphical
User Interfaces.
These allow stepping through a code with a window for each process
of interest. For details of how to pop a debugger GUI, see
Debugger GUIs on the Blade Center.
|
Last modified: July 24 2006 09:56:53.
Copyright © 2003-2007 by
NC State University and
others, All Rights Reserved.
HPC & Grid (Version
1.4
/
Site access count: 730141)
- Site/Content Notice
Site contact: Eric Sills, E-mail:
eric_sills at ncsu dot edu , Tel: 919-513-0324, Fax: 919-513-1893,
HPC and Grid Operations, Information Technology Division,
Box 7109, North Carolina State University, Raleigh,
NC27695-7914, USA
|
|