Wednesday, September 3, 2014

How does debugger work with PDB?

Debugger is a big topic but this page only describes how debugger works with PDB file.

As WIKI description, PDB is developed by Microsoft for storing debugging information about a program. Debugger consumes PDB files to know the line number and file path for a given symbol or memory address where the program is broken. Unfortunately, the PDB format is undocumented. We only access PDB via DIA (Debug Interface Access) or DBGHELP DLL library (dbghelp.dll). We just imagine that PDB stores the following information at least.

  • Line Number
  • File Path
  • Offset (of Base Address)
  • Symbol Name

The most basic question is how to find file path and line number by a given memory address. The answer is very simple. Debugger just queries PDB by memory address to get the information.

Let's write a test program with VC to verify my assumption. The following is a simple program to print "Hello World!"


We build the program in debug mode. When debugging it, Visual Studio displays that the VcApp.exe is loaded at the Base Address, 00400000h.


When we move the cursor to the function decorations of main() and PrintMessage(), it displays their memory address are as follows.

PrintMessage = 004113B0h
main         = 00411410h

We have the second question. How does debugger know the file path and line number of the PrintMessage. For the question, I suppose that there are two tables in PDB.

Table1 = (Symbol Name, Offset)
Table2 = (Offset, Line Number, File Path)

Let's manually create Table1.

PrintMessage - Base Address = 000113B0h
main - Base Address         = 00011410h


Symbol Name  Offset
------------ ---------
PrintMessage 000113B0h
main         00011410h


When we run pdb_print_gvars.py provided by PDBparse that was describe in another page, it reads PDB to display offsets of symbols as follows.

> python pdb_print_gvars.py VcApp.pdb 0
__enc$textbss$begin,0x1000,0,.textbss
__enc$textbss$end,0x11000,0,.textbss
_PrintMessage,0x113b0,2,.text
__imp__printf,0x182bc,0,.idata
__RTC_CheckEsp,0x114e0,2,.text
__RTC_Shutdown,0x11720,2,.text
__RTC_InitBase,0x116e0,2,.text
_main,0x11410,2,.text
??_C@_0M@FKNCOEJD@Hello?5Word?6?$AA@,0x1573c,0,.rdata
___security_cookie,0x17000,0,.data

It displays that the offset of PrintMessage is 000113B0h and main is 00011410h.

It proofs that Table1 really exists. How about Table2? Unfortunately, there are not examples of PDBparse to display file path and line number. I have not dug the information by reading PDB file yet. I decided to use dbghelp.dll to get file path and line number in PDB. I wrote a Python program to consume the DLL. The following are part of the program.

The whole source code was published in the page.
Write Python Program to Use DBGHELP.DLL to Access a PDB File.

...
BaseAddr = ctypes.c_uint64 (0x400000)
Status = SymLoadModule64 (ProcessHandle, 0, "VcApp.pdb", None, BaseAddr, SizeOfDll)
...
dwAddr = ctypes.c_uint64 (0x4113B0)
Status = SymGetLineFromAddr64 (ProcessHandle, dwAddr, ctypes.byref (pdwDisplacement), ctypes.byref (Line))
print ("Dump Line:")
print ("  SizeOfStruct = %d" % Line.SizeOfStruct)
print ("  Key = %s" % Line.Key)
print ("  LineNumber = %d" % Line.LineNumber)
print ("  FileName = %s" % Line.FileName)
print ("  Address = %xh" % Line.Address)
...
Status = SymUnloadModule64 (ProcessHandle, BaseAddr)
...

The result displays:
Dump Line:
  SizeOfStruct = 0
  Key = None
  LineNumber = 7
  FileName = d:\vcapp\vcapp.c
  Address = 4113b0h

As the result, this program reports file path (d:\vcapp\vcapp.c) and line number (7) by the given memory address (0x4113B0) where the VcApp.exe is broken at the PrintMessage(). This program proofs that Table2 exists.

This program also help to solve the first question, how to find file path and line number by an given memory address? I consider that Table1 and Table2 or the merged table fulfil the solution of the first question.

I've explains how debugger works with PDB file.













1 comment: