If you have any additions, corrections, ideas, or bug reports please stop by the Builder Academy at telnet://tbamud.com:9091 or email rumble@tbamud.com -- Rumble
The Art of Debugging Originally by Michael Chastain and Sammy
The following documentation is excerpted from Merc 2.0s hacker.txt file. It was written by Furey of MERC Industries and is included here with his permission. We have packaged it with tbaMUD (changed in a couple of places, such as specific filenames) because it offers good advice and insight into the art and science of software engineering. More information about tbaMUD, can be found at the tbaMUD home page http://tbamud.com.
1 Im running a Mud so I can learn C programming!
Yeah, right. The purpose of this document is to record some of our knowledge, experience and philosophy. No matter what your level, we hope that this document will help you become a better software engineer. Remember that engineering is work, and no document will substitute for your own thinking, learning and experimentation.
2 How to Learn in the First Place
Play with something. Read the documentation on it. Play with it some more. Read documentation again. Play with it some more. Read documentation again. Play with it some more. Read documentation again. Get the idea?
The idea is that your mind can accept only so much new data in a single session. Playing with something doesnt introduce very much new data, but it does transform data in your head from the new category to the familiar category. Reading documentation doesnt make anything familiar, but it refills your new hopper.
Most people, if they even read documentation in the first place, never return to it. They come to a certain minimum level of proficiency and then never learn any more. But modern operating systems, languages, networks, and even applications simply cannot be learned in a single session. You have to work through the two-step learning cycle many times to master it.
3 Basic Unix Tools
man gives you online manual pages.
grep stands for global regular expression print; searches for strings in text files.
vi, emacs, jove use whatever editor floats your boat, but learn the hell out of it; you should know every command in your editor.
ctags mags tags for your editor which allows you to go to functions by name in any source file.
, >>, <, | input and output redirection at the command line; get someone to show you, or dig it out of man csh
These are the basic day-in day-out development tools. Developing without knowing how to use all of these well is like driving a car without knowing how to change gears.
4 Debugging: Theory
Debugging is a science. You formulate a hypothesis, make predictions based on the hypothesis, run the program and provide it experimental input, observe its behavior, and confirm or refute the hypothesis.
A good hypothesis is one which makes surprising predictions which then come true; predictions that other hypotheses dont make.
The first step in debugging is not to write bugs in the first place. This sounds obvious, but sadly, is all too often ignored.
If you build a program, and you get any errors or any warnings, you should fix them before continuing. C was designed so that many buggy ways of writing code are legal, but will draw warnings from a suitably smart compiler (such as gcc with the -Wall flag enabled). It takes only minutes to check your warnings and to fix the code that generates them, but it takes hours to find bugs otherwise.
Desk checking (proof reading) is almost a lost art these days. Too bad. You should desk check your code before even compiling it, and desk-check it again periodically to keep it fresh in mind and find new errors. If you have someone in your group whose only job it is to desk-check other peoples code, that person will find and fix more bugs than everyone else combined.
One can desk-check several hundred lines of code per hour. A top-flight software engineer will write, roughly, 99% accurate code on the first pass, which still means one bug per hundred lines. And you are not top flight. So... you will find several bugs per hour by desk checking. This is a very rapid bug fixing technique. Compare that to all the hours you spend screwing around with broken programs trying to find one bug at a time.
The next technique beyond desk-checking is the time-honored technique of inserting print statements into the code, and then watching the logged values. Within tbaMUD code, you can call printf(), fprintf(), or log()to dump interesting values at interesting times. Where and when to dump these values is an art, which you will learn only with practice.
If you dont already know how to redirect output in your operating system, now is the time to learn. On Unix, type the command man csh, and read the part about the > operator. You should also learn the difference between standard output (for example, output from printf) and standard error (for example, output from fprintf(stderr, ...)).
Ultimately, you cannot fix a program unless you understand how it is operating in the first place. Powerful debugging tools will help you collect data, but they cant interpret it, and they cant fix the underlying problems. Only you can do that.
When you find a bug... your first impulse will be to change the code, kill the manifestation of the bug, and declare it fixed. Not so fast! The bug you observe is often just the symptom of a deeper bug. You should keep pursuing the bug, all the way down. You should grok the bug and cherish it in fullness before causing its discorporation.
Also, when finding a bug, ask yourself two questions: What design and programming habits led to the introduction of the bug in the first place? And: What habits would systematically prevent the introduction of bugs like this?
5 Debugging: Tools
When a Unix process accesses an invalid memory location, or (more rarely) executes an illegal instruction, or (even more rarely) something else goes wrong, the Unix operating system takes control. The process is incapable of further execution and must be killed. Before killing the process, however, the operating system does something for you: it opens a file named core and writes the entire data space of the process into it.
Thus, dumping core is not a cause of problems, or even an effect of problems. Its something the operating system does to help you find fatal problems which have rendered your process unable to continue.
One reads a core file with a debugger. The two most popular debuggers on Unix are adb and gdb, although occasionally one finds dbx. Typically one starts a debugger like this: gdb bin/circle or gdb bin/circle lib/core.
The first thing, and often the only thing, you need to do inside the debugger is take a stack trace. In adb, the command for this is $c. In gdb, the command is backtrace. In dbx, the command is where. The stack trace will tell you what function your program was in when it crashed, and what functions were calling it. The debugger will also list the arguments to these functions. Interpreting these arguments, and using more advanced debugger features, requires a fair amount of knowledge about assembly language programming.
6 Using GDB
When debugging NEVER EVER run the autorun script. It will mask your debugging output - the log output will be redirected, etc.
Either a) run your executable directly: "bin/circle". Crash the mud. Then run gdb on the core file: "gdb ../bin/circle core.#" if there is one. If there isn't, then
b) run your executable through gdb: "gdb bin/circle", "run" (**) (if you get a SIGPIPE stop here, "cont" once). Crash the mud.
Do a backtrace - "bt" - and repeat the following until you can't go higher: "list" followed by "up".
Thus, a typical debugging session looks like this:
gdb bin/circle gdb> run <mud log - it's usually helpful to include the last couple of lines.> Program received SIGSEV in some_function(someparamaters) somefile.c:1223 gdb> bt #0 0x234235 some_function(someparamaters) somefile.c:1223 #1 0x343353 foo(bar) foo.c:123 #2 0x12495b baz(0x0000000) baz.c:3 gdb> list 1219 1220 foobar = something_interesting(); 1221 1222 foobar = NULL; 1223 free(foobar); 1224 } 1225 1226 1227 next_function_in_somefile_c(void) gdb> up frame #1 0x343353 foo(bar) foo.c:123 gdb> list <similar output to the above, but different file> gdb> up frame #2 0x12495b baz(0x0000000) baz.c:3 gdb> list <similar output to the above, but different file> gdb> up You are already at the top frame.
If this doesn't solve your problem post it on the forums at http://tbamud.com and ask for help. Include all the gdb output and your tbaMUD version.
GDB has some online help, though it is not the best. It does at least give a summary of commands and what they're supposed to do. What follows is Sammy's short intro to gdb with some bughunting notes following it:
tbaMUD allows multiple core files to be created. Be sure to delete them when you are done. If you've got a core file, go to your lib directory and type:
dir
You should see the core files listed, something like: core.#
To use the core file type:
gdb ../bin/circle core.#
If you want to hunt bugs in real time (causing bugs to find the cause as opposed to checking a core to see why the MUD crashed earlier) use:
gdb bin/circle
If you're working with a core, gdb should show you where the crash occurred. If you get an actual line that failed, you've got it made. If not, the included message should help. If you're working in real time, now's the time to crash the MUD so you can see what gdb catches.
When you've got the crash info, you can type where'' to see which function called the crash function, which function called that one, and so on all the way up to
main()''.
I should explain about context'' You may type
print ch'' which you
would expect to show you the ch variable, but if you're in a function that
doesn't get a ch passed to it (real_mobile, etc), you can't see ch because
it's not in that context. To change contexts (the function levels you saw
with where) type up'' to go up. You start at the bottom, but once you go up, and up, and up, you can always go back
down''. You may be able to go
up a couple functions to see a function with ch in it, if finding out who
caused the crash is useful (it normally isn't).
The ``print'' command is probably the single most useful command, and lets you print any variable, and arithmetic expressions (makes a nice calculator if you know C math). Any of the following are valid and sometimes useful:
print ch (fast way to see if ch is a valid pointer, 0 if it's not) print *ch (prints the contents of ch, rather than the pointer address) print ch->player.name (same as GET_NAME(ch)) print world[ch->in_room].number (vnum of the room the char is in) etc..
Note that you can't use macros (all those handy psuedo functions like GET_NAME and GET_MAX_HIT), so you'll have to look up the full structure path of variables you need.
Type ``list'' to see the source before and after the line you're currently looking at. There are other list options but I'm unfamiliar with them.
There are only a couple of commands to use in gdb, though with some patience they can be very powerful. The only commands I've ever used are:
run well, duh. print [variable] also duh, though it does more than you might think list shows you the source code in context break [function] set a breakpoint at a function clear [function] remove a breakpoint step execute one line of code cont continue running after a break or ctrl-c
I've run into nasty problems quite a few times. The cause is often a memory problem, usually with pointers, pointers to nonexistent memory. If you free a structure, or a string or something, the pointer isn't always set to NULL, so you may have code that checks for a NULL pointer that thinks the pointer is ok since it's not NULL. You should make sure you always set pointers to NULL after freeing them.
Ok, now for the hard part. If you know where the problem is, you should be
able to duplicate it with a specific sequence of actions. That makes things
much easier. What you'll have to do is pick a function to break'' at. The ideal place to break is immediately before the crash. For example, if the crash occurred when you tried to save a mob with medit, you might be able to
break mobs_to_file''. Try that one first.
When you `medit save', the MUD will hang. GDB will either give you segfault info, or it will be stopped at the beginning of mobs_to_file. If it segfaulted, pick an earlier function, like copy_mobile, or even do_medit.
When you hit a breakpoint, print the variables that are passed to the function to make sure they look ok. Note that printing the contents of pointers is possible with a little playing around. For example, if you print ch, you get a hex number that shows you the memory location where ch is at. It's a little helpful, but try print *ch and you'll notice that it prints the contents of the ch structure, which is usually more useful. print ch->player will give you the name of the person who entered the command you're looking at, and some other info. If you get a no ch in this context it is because the ch variable wasn't passed to the function you're currently looking at.
Ok, so now you're ready to start stepping. When GDB hit your breakpoint, it showed you the first line of executable code in your function, which will sometimes be in your variable declarations if you initialized any variables (ex: int i = 0). As you're stepping through lines of code, you'll see one line at a time. Note that the line you see hasn't been run yet. It's actually the next line to be executed. So if the line is a = b + c;, printing a will show you what a was before this line, not the sum of b and c. If you have an idea of where the crash is occurring, you can keep stepping till you get to that part of the code (tip: pressing return will repeat the last GDB command, so you can type step once, then keep pressing return to step quickly). If you have no idea where the problem is, the quick and dirty way to find your crash is to keep pressing return rapidly (don't hold the eturn key or you'll probably miss it). When you get the seg fault, you can't step any more, so it should be obvious when that happens.
Now that you've found the exact line where you get the crash, you should start the MUD over and step more slowly this time. What I've found that works really well to save time is to create a dummy function. This one will work just fine:
void dummy(void){}
Put that somewhere in the file you're working on. Then, right before the crash, put a call to dummy in the code (ex: dummy();). Then set your breakpoint at dummy, and when you hit the breakpoint, step once to get back to the crashing code.
Now you're in total control. You should be looking at the exact line that gave you the crash last time. Print every variable on this line. Chances are one of them will be a pointer to an unaccessable memory location. For example, printing ch->player.name may give you an error. If it does, work your way back and print ch->player to make sure that one's valid, and if it isn't, try printing ch.
Somewhere in there you're going to have an invalid pointer. Once you know which one it is, it's up to you to figure out why it's invalid. You may have to move dummy() up higher in the code and step slowly, checking your pointer all the way to see where it changes from valid to invalid. You may just need to NULL a free'd pointer, or you may have to add a check for a NULL pointer, or you may have screwed up a loop. I've done all that and more :)
Well, that's it in a nutshell. There's a lot more to GDB that I haven't even begun to learn, but if you get comfortable with print and stepping you can fix just about any bug. I spent hours on the above procedure trying to get my ascii object and mail saving working right, but it could have taken weeks without gdb. The only other suggestion I have is to check out the online gdb help. It's not very helpful for learning, but you can see what commands are available and play around with them to see if you can find any new tools.
7 Profiling
Another useful technique is profiling, to find out where your program is spending most of its time. This can help you to make a program more efficient.
Here is how to profile a program:
-
Remove all the .o files and the circle executable: make clean
-
Edit your Makefile, and change the PROFILE=line: PROFILE = -p
-
Remake circle: make
-
Run circle as usual. Shutdown the game with the shutdown command when you have run long enough to get a good profiling base under normal usage conditions. If you crash the game, or kill the process externally, you wont get profiling information.
-
Run the profcommand: prof bin/circle > prof.out
-
Read prof.out. Run man prof to understand the format of the output. For advanced profiling, you can use PROFILE = -pg in step 2, and use the gprof command in step 5. The gprof form of profiling gives you a report which lists exactly how many times any function calls any other function. This information is valuable for debugging as well as performance analysis.
Availability of prof and gprof varies from system to system. Almost every Unix system has prof. Only some systems have gprof.
7 Books for Serious Programmers
Out of all the thousands of books out there, three stand out:
Kernighan and Plaugher, The Elements of Programming Style Kernighan and Ritchie, The C Programming Language Brooks, The Mythical Man Month