Developing a Graphics Driver II
In part 1, I briefly covered how things are structured. This entry is all about debugging the driver. Like I said, the driver lives partly in usermode and partly in kernel mode. As a practical matter, it's convenient to be able to access either part at will. In other words, a usermode debugger isn't enough -- we need a full kernel debugger. And because kernel debugging freezes the entire machine completely when you hit a breakpoint, it's absolutely necessary to have two machines. One runs all the programs, and the other debugs it. You can connect the two in any of several ways; Firewire is the most convenient. Serial works, if you want to suffer that, and USB might work, with catches that I am not familiar with. The computers had firewire ports, so I used them.
Sadly, VS does not support kernel debugging. I have no idea why not. The tool of choice for most people who need to do Windows kernel debugging is WinDbg. This program is a psychotically powerful debugger, with a terribly akward and irritating UI frontend attached. It's also kind of slow. Still, it does the job, and setting it up isn't too bad. Once you have the two machines connected, you need to set up the slave for kernel debugging. Once that's done, you can fire up WinDbg on the master, and boot the slave. WinDbg will automatically establish a connection when the machine comes up. (Note that things don't need to happen in this order. WinDbg can connect to an already running machine.) After that, it functions like a normal debugger, except that everything on the machine is being debugged. Any process or module that invokes an int 3, the breakpoint interrupt, will be caught by WinDbg. There is one catch, though; the OS can't recover from a driver invoking a breakpoint. So if you accidentally run a driver build with debugging stuff enabled but without a debugger attached, there's a good chance you'll hard lock the machine.
I'll set breakpoints in the driver as necessary to inspect what I'm trying to debug. When a breakpoint is hit, it's pretty much like debugging in VS. All the same information is available, albeit in a much worse interface. It's particularly important to make sure that WinDbg knows where to find debug symbols. Sources of symbols include the PDB generated by the build, and Microsoft's public symbol server. With those correctly set up, I can see correct call stacks through the NT kernel and the driver. WinDbg also knows how to find the driver's code files from those symbols, so when a breakpoint is hit, it can open the relevant code and point at what's going on. (Again, in a much worse way than VS. We're more at the WinDiff level of "prettiness" here.) What I don't usually get is symbols for the actual application being debugged. That's not surprising; we have access to a lot here, but debug builds of games plus all symbols, let alone code, are not really part of that. So I can see what the application is doing externally, but not what is going on inside its head. It's an interesting role reversal, actually. It's also shown me that a lot of games -- even major commercial AAA+ titles -- behave rather badly with respect to the driver.
Most of the hard work is really in isolating the conditions that cause a bug to occur, and closing in on the source of the problem. Once you know why something has gone wrong, it's usually a fairly trivial change. Not always, of course. Occasionally, it's a real pain, especially when dealing with a badly behaved or cruel application that hits a soft spot, or expects certain behavior where no such behavior was guaranteed. (And of course, it worked on the small set of hardware the game developer has, which may not even be NVIDIA based.)
I wrote this up fairly quickly. Feel free to ask questions, but be aware that there are very constricting limits on how much I can say. Don't let that stop you from asking; just don't be disappointed if I don't provide an answer.