    <rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/">
     <channel>
        <title>ACCU  :: Microsoft Symbol Engine</title>
        <link>https://members.accu.org/index.php/journals/276</link>
        <description>Professionalism in Programming</description>
        <dc:language>en-us</dc:language> 
        <dc:creator>Administrator</dc:creator> 
        <admin:generatorAgent rdf:resource="http://www.xaraya.org" /> 
        <admin:errorReportsTo rdf:resource="mailto:webeditor@accu.org" />
       <sy:updatePeriod>hourly</sy:updatePeriod>
       <sy:updateFrequency>1</sy:updateFrequency>
       <docs>http://backend.userland.com/rss</docs>


        <h2>Journal Articles</h2>


<div class="xar-mod-head"><span class="xar-mod-title">Overload Journal #67 - Jun 2005 + Programming Topics</span></div>

<table border="0" cellpadding="1" cellspacing="0">
    <tbody>
    <tr>
        <td valign="top">
            Browse in :
       </td>
       <td valign="top">

                                            <a href="https://members.accu.org/index.php/journals/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c76/">Journals</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c78/">Overload</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c145/">67</a>
                    (8)
<br />

                                            <a href="https://members.accu.org/index.php/journals/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c13/">Topics</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c65/">Programming</a>
                    (877)
<br />

                                            <a href="https://members.accu.org/index.php/journals/c145-65/">Any of these categories</a>

                    -                        <a href="https://members.accu.org/index.php/journals/c145+65/">All of these categories</a>
<br />
</td>
   </tr>
   </tbody>
</table>




<div class="xar-error">
   <p>
 <strong>Note:</strong> when you create a new publication type,
the articles module will automatically use the templates
<em>user-display-[publicationtype].xt</em>
and <em>user-summary-[publicationtype].xt</em>.
If those templates do not exist when you try to preview or display a new article,
you'll get this warning :-)  Please place your own templates in themes/<em>yourtheme</em>/modules/articles . The templates will get the extension .xt there. </p>
</div>
<div class="xar-norm xar-standard-box-padding">
   <h1><strong>Title:</strong>&nbsp;Microsoft Symbol Engine</h1>
<p><strong>Author:</strong>&nbsp;</p>
<p>
<strong>Date:</strong> 02 June 2005 05:00:00 +01:00 or Thu, 02 June 2005 05:00:00 +01:00</p>
<p><strong>Summary:</strong>&nbsp;</p>
<p><strong>Body:</strong>&nbsp;<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e18" id=
"d0e18"></a>Introduction</h2>
</div>
<p>Last year, I wrote an article detailing some code to provide a
stack trace with symbols in Microsoft Windows. [<a href=
"#Orr2004">Orr2004</a>]</p>
<p>On reflection, I think the Microsoft symbol engine deserves
greater explanation so this article discusses more about the symbol
engine, what it does and where to get it from. The ultimate aim is
to provide useful information which helps you diagnose problems in
your code more easily.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e28" id="d0e28"></a>What Are
Symbols?</h2>
</div>
<p>When a program is compiled and linked into executable code, a
large part of the process is turning human readable symbols into
machine instructions and addresses. The CPU, after all, does not
care about symbolic names but operates on a sequence of bytes. In
systems that support dynamic loading of code, some symbols may have
to remain in the linked image in order for functions to be resolved
into addresses when the module is loaded. Typically though, even
this is only a subset of the names appearing in the source
code.</p>
<p>When everything works perfectly this is usually fine; the
difficulties occur when the program contains a bug as we would like
to be able to work back from the failing location to the relevant
source code, and identify where we are, how we got there and what
are the names and locations of any local variables. These pieces of
information can all be held as symbol data and interrogated,
usually by a debugger, to give human readable information in the
event of a problem.</p>
<p>Most programmers using Microsoft C++ on Windows are familiar
with the Microsoft Debug/Release paradigm (many other environments
have a similar split). In this model of development, you begin by
compiling a 'Debug' build of the code base in which there is no
optimisation and a full set of symbols are emitted for each
compiled binary. This generally gives the debugger the ability to
work backwards from a logical address and stack pointer to give the
source line, stack trace and contents of all variables. Later in
the development process you switch over to building the 'Release'
version of the code which typically has full optimisation and
generates no symbolic information in the output binaries.</p>
<p>There are several pitfalls with this approach. In my experience
the most serious is when you have problems which are only
reproducible in the release build and not in the debug build. Since
there are no symbols in the release build it can be very hard to
resolve the problem.</p>
<p>Fortunately this is easily resolved. It is relatively easy to
change the project settings to generate symbolic information for
the release build as well as for the debug build. An alternative
approach is to abandon (or at least modify) the Debug/Release
split, perhaps material for another article!</p>
<p>For Microsoft.NET 2003 C++ you enable symbols in release build
by setting options for the compile and link stages. First set
'Debug Information Format' to 'Program Database' in the C/C++
'General' folder. Then set the linker settings Generate Debug Info
to 'Yes' in the 'Debugging' folder, and specify a .PDB filename for
the program database file name. Finally you must set 'References'
to 'Eliminate Unreferenced Data' and 'Enable COMDAT Folding' to
'Remove redundant COMDATs' in the 'Optimization' folder because the
Microsoft linker changes its default behaviour for these two
options when debugging is enabled. (Settings exist in other
versions of the Microsoft C++ compiler, and also for VB.NET and C#.
See [<a href="#Robbins">Robbins</a>] for more details.)</p>
<p>I also recommend removing other one optimisation setting, that
of stack frame optimisation, to greatly improve the likelihood of
being able to get a reliable stack trace in a release build. If
performance is very important in your application, measure the
effect of this optimisation to see whether it makes a sufficient
difference to be worth retaining. With these settings applied to a
release build the compiler generates a PDB file for each built EXE
or DLL, in a similar manner to the default behaviour for a debug
build. The PDB file, also known as the symbol file, is referred to
in the header records in the EXE/DLL but none of the symbols are
loaded by default, so there is no impact on performance simply
having a PDB file.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e48" id="d0e48"></a>The Symbol
Engine</h2>
</div>
<p>Microsoft do not document the format of the PDB file and it
often seems to change from release to release. However they do
provide an API for accessing most of the information held in the
PDB file and the key to this is a file <tt class=
"filename">DbgHelp.dll</tt>. This library contains functions to
unpack symbol information for addresses, local variables, etc. A
version of this DLL is present in Windows 2000, XP and 2003 but
Microsoft make regular updates available via its website as
'Debugging tools for Windows' [<a href="#DbgHelp">DbgHelp</a>].
Note that if you want to write code using the API you need to
install the SDK (by using the 'Custom' installation).</p>
<p>However it is hard to update <tt class=
"filename">DbgHelp.dll</tt> in place in a running system (and
attempts to do so can render some other Windows tools inoperable)
so it is recommended that you either:</p>
<div class="itemizedlist">
<ul type="disc">
<li>
<p>ensure the correct version of the DLL is placed with the EXE
which is going to use it , or</p>
</li>
<li>
<p>load the DLL explicitly from a configured location.</p>
</li>
</ul>
</div>
<p>Personally, I find both these solutions cause unnecessary
complications so I simply copy the DLL to <tt class=
"filename">DbgCopy.dll</tt> and generate a corresponding <tt class=
"filename">Dbgcopy.lib</tt> file from this DLL, which is included
at link time. The makefile included in the source code for this
article has a target dbgCopy which builds this pair of files.</p>
<p>The debug help API usually expects to find the PDB file for a
binary EXE/DLL by looking for the file in its original location, or
along the path. However the Debugging Tools for Windows package
also contains a DLL that can connect to a so-called 'Symbol Server'
to get the PDB file. Microsoft provide a publicly accessible symbol
server containing all the symbols for the retail versions of their
operating system, which lets you get symbolic names (and improved
stack walking) for addresses in their DLLs. This is invaluable when
you get problems inside a system DLL; usually, but not always,
caused by providing it with bad data!</p>
<p>This DLL, <tt class="filename">SYMSRV.DLL</tt>, is activated by
setting the environment variable _<tt class=
"literal">NT_SYMBOL_PATH</tt> to tell DbgHelp to use the symbol
server. Note that this only works correctly if the <tt class=
"filename">DbgHelp.DLL</tt> and <tt class=
"filename">SymSrv.DLL</tt> are both loaded from the same location
and are from the same version of 'Debugging Tools'.</p>
<p>The environment variable can be set from the command line for
the current windowed command prompt, or more typically set via the
control panel for the current user or even for the current machine.
An example setting to load symbols from the Microsoft site is using
a local cache in <tt class="filename">C:\Symbols</tt> is:</p>
<pre class="programlisting">
set _NT_SYMBOL_PATH=SRV*C:\Symbols*
http://msdl.microsoft.com/download/symbols
</pre>
<p>There are a couple of problems with this simple approach.
Firstly, the Microsoft site may not be available (for example, a
company firewall may not grant access to the location specified) so
the symbols for system DLLs are inaccessible to the symbol engine.
Secondly, the symbol engine tries to access the Microsoft site for
every EXE or DLL that it loads for which it cannot find local
symbols. This can take quite a long time if have many DLLs that do
not have any debugging information.</p>
<p>As an alternative you can set up the path as above and use the
<tt class="literal">Symchk</tt> program to load symbols for a
number of common DLLS (for example <tt class=
"filename">KERNEL32</tt>, <tt class="filename">MSVCRT</tt>,
<tt class="filename">NTDLL</tt>), and then remove the <tt class=
"literal">http://...</tt> portion of the environment variable to
just access the local cache.</p>
<p>A more advanced technique which is also available is to set up a
symbol server, running on your own network. You can then publish
symbol files, built in-house or arriving with third party
libraries, to this symbol server for use throughout your company
without needing to explicitly install them on every machine.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e123" id="d0e123"></a>Using the
Symbol Engine</h2>
</div>
<p>I present some basic code to use the symbol engine, show how to
convert an address to a symbol and show a simple example of the
stack walking API. Please refer to the help for the debugging DLL
(provided with the Debugging Tools SDK - <tt class=
"filename">DbgHelp.chm</tt>) for more information and description
of other methods that I am not covering in this introductory
article.</p>
<p>The symbol engine needs initialising for each process you wish
to access. Each call to the symbol engine includes a process handle
as one of the arguments, this does not actually have to be an
actual process handle in every case but I find it much easier to
stick to that convention. Calls to initialise the symbol engine for
a given process 'nest' and only when each initialisation call is
matched with its corresponding clean up call does the symbol engine
close down the data structures for the process.</p>
<p>Note: there are a small number of resource leaks in <tt class=
"filename">DbgHelp.dll</tt>, some of which are retained after a
clean-up, so I would advise you to try and reduce the number of
times you initialise and clean up the symbol engine. My simple
example code uses the singleton pattern for this reason.</p>
<p>Here is a class definition for a simple symbol engine:</p>
<pre class="programlisting">
/** Symbol Engine wrapper to assist with 
    processing PDB information 
*/
class SimpleSymbolEngine
{
public:
    /** Get the symbol engine for this process
    */
    static SimpleSymbolEngine &amp;instance();

    /** Convert an address to a string */
    std::string addressToString(
        void *address );

    /** Provide a stack trace for the
        specified stack frame 
    */
    void StackTrace(
        PCONTEXT pContext, 
        std::ostream &amp; os );

private:
   // not shown 
};
</pre>
<p>This class can be used to provide information about the calling
process like this:</p>
<pre class="programlisting">
void *some_adress = ...;
std::string symbolInfo=
    SimpleSymbolEngine::instance().
        addressToString( some_address );
</pre>
<p>I've picked a simple format for the symbolic information for an
address - here is an example:</p>
<pre class="programlisting">
    0x00401158 fred+0x56 at 
testSimpleSymbolEngine.cpp(13)
</pre>
<p>The first field is the address, then the closest symbol found
and the offset of the address from that symbol and finally, if
available, the file name and line number for the address.</p>
<p>Using the stack trace is more difficult as you must provide a
context for the thread you wish to stack trace.</p>
<p>The context structure is architecture-specific and can be
obtained using the <tt class="function">GetThreadContext()</tt> API
when you are trying to debug another thread.</p>
<pre class="programlisting">
CONTEXT context = {CONTEXT_FULL};
::GetThreadContext( hOtherThread, &amp;context );
SimpleSymbolEngine::instance().
    StackTrace ( &amp;context, std::cout );
</pre>
<p>You have to be slightly more devious to trace the stack of the
calling thread since the <tt class=
"function">GetThreadContext()</tt> API will return the context at
the point when the API was called, which will no longer be valid by
the time the stack trace function is executed.</p>
<p>One approach is to start another thread to print the stack
trace. Another approach, which is architecture-specific, is to use
a small number of assembler instructions to set up the instruction
pointer and stack addresses in the context registers. You have to
be careful if you wish to provide this as a callable method to
ensure the return address of the function is correctly obtained,
for this article I simply use some assembler inline.</p>
<p>Here is a simple way (for Win32) to use the symbol engine to
print the call stack at the current location:</p>
<pre class="programlisting">
CONTEXT context = {CONTEXT_FULL};
::GetThreadContext( 
    GetCurrentThread(), &amp;context );
_asm call $+5
_asm pop eax
_asm mov context.Eip, eax
_asm mov eax, esp
_asm mov context.Esp, eax
_asm mov context.Ebp, ebp

SimpleSymbolEngine::instance().
    StackTrace( &amp;context, std::cout );
</pre>
<p>In this case the tip of the call stack will be the <tt class=
"literal">pop eax</tt> instruction since this is the target of the
<tt class="literal">call $+5</tt> which I use to get the
instruction pointer.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e180" id=
"d0e180"></a>Implementation Details</h2>
</div>
<p>The constructor initialises the symbol engine for the current
process and the destructor cleans up.</p>
<pre class="programlisting">
SimpleSymbolEngine::SimpleSymbolEngine()
{
   hProcess = GetCurrentProcess();
   DWORD dwOpts = SymGetOptions();
   dwOpts |=
       SYMOPT_LOAD_LINES |
       SYMOPT_DEFERRED_LOADS;
   SymSetOptions ( dwOpts );

   ::SymInitialize( hProcess, 0, true );
   }
   SimpleSymbolEngine::~SimpleSymbolEngine()
   {
       ::SymCleanup( hProcess );
   }
</pre>
<p>I am setting the flag to defer loads which delays loading
symbols until they are required. Typically symbols are only used
from a small fraction of the DLLs loaded when the process
executes.</p>
<p>The code to get symbolic information from an address uses two
APIs: <tt class="function">SymGetSymFromAddr</tt> and <tt class=
"function">SymGetLineFromAddr</tt>. Between them these APIs get the
nearest symbol and the closest available line number/source file
information for the supplied address.</p>
<pre class="programlisting">
   std::string SimpleSymbolEngine::addressToString( void *address )
   {
       std::ostringstream oss;

       // First the raw address
       oss &lt;&lt; &quot;0x&quot; &lt;&lt; address;

       // Then any name for the symbol
       struct tagSymInfo
       {
           IMAGEHLP_SYMBOL symInfo;
           char nameBuffer[ 4 * 256 ];
       } SymInfo = { { sizeof( IMAGEHLP_SYMBOL ) } };

       IMAGEHLP_SYMBOL * pSym = &amp;SymInfo.symInfo;
       pSym-&gt;MaxNameLength = sizeof( SymInfo ) - offsetof( tagSymInfo, symInfo.Name );

       DWORD dwDisplacement;
       if ( SymGetSymFromAddr( hProcess, (DWORD)address, &amp;dwDisplacement,  pSym) )
       {
           oss &lt;&lt; &quot; &quot; &lt;&lt; pSym-&gt;Name;
           if ( dwDisplacement != 0 )
               oss &lt;&lt; &quot;+0x&quot; &lt;&lt; std::hex &lt;&lt; dwDisplacement &lt;&lt; std::dec;
       }
        
       // Finally any file/line number
       IMAGEHLP_LINE lineInfo = { sizeof( IMAGEHLP_LINE ) };
       if ( SymGetLineFromAddr( hProcess, (DWORD)address, &amp;dwDisplacement, &amp;lineInfo ) )
       {
           char const *pDelim = strrchr( lineInfo.FileName, '\\' );
           oss &lt;&lt; &quot; at &quot; &lt;&lt; ( pDelim ? pDelim + 1 : lineInfo.FileName ) &lt;&lt; &quot;(&quot; &lt;&lt; lineInfo.LineNumber &lt;&lt; &quot;)&quot;;
       }
       return oss.str();
   }
</pre>
<p>The main complication with the two APIs used is that both need
the size of the data structures to be set up correctly before the
call is made.</p>
<p>Failure to do this leads to rather inconsistent results.
Particular care is needed for the IMAGEHLP_SYMBOL since the
structure is variable size.</p>
<p>Note too that the documentation for DbgHelp refers to some newer
APIs (<tt class="function">SymFromAddr</tt>, <tt class=
"function">SymGetLineFromAddr64</tt>) which do the same thing as
these two.</p>
<p>I have used the older calls here since they are available on a
much wider range of versions of the DbgHelp API.</p>
<p>The stack walking code sets up the structure used to hold the
current stack location and then uses the stack walking API to
obtain each stack frame in turn.</p>
<div class="sidebar">
<p>All the source code for this article is available at:</p>
<p><a href=
"http://www.howzatt.demon.co.uk/articles/SimpleSymbolEngine.zip"
target=
"_top">http://www.howzatt.demon.co.uk/articles/SimpleSymbolEngine.zip</a></p>
<p>Copyright (c) Roger Orr - rogero@howzatt.demon.co.uk $Revision:
1.11 $ $Date: 2005/05/07 17:13:50 $</p>
</div>
<pre class="programlisting">
   void SimpleSymbolEngine::StackTrace( PCONTEXT pContext, std::ostream &amp; os )
   {
       os &lt;&lt; &quot;  Frame       Code address\n&quot;;

       STACKFRAME stackFrame = {0};

       stackFrame.AddrPC.Offset = pContext-&gt;Eip;
       stackFrame.AddrPC.Mode = AddrModeFlat;

       stackFrame.AddrFrame.Offset = pContext-&gt;Ebp;
       stackFrame.AddrFrame.Mode = AddrModeFlat;

       stackFrame.AddrStack.Offset = pContext-&gt;Esp;
       stackFrame.AddrStack.Mode = AddrModeFlat;

       while ( ::StackWalk(
          IMAGE_FILE_MACHINE_I386,
          hProcess,
          GetCurrentThread(), // this value doesn't matter much if previous one is a real handle
          &amp;stackFrame, 
          pContext,
          NULL,
          ::SymFunctionTableAccess,
          ::SymGetModuleBase,
          NULL ) )
       {
           os &lt;&lt; &quot;  0x&quot; &lt;&lt; (void*) stackFrame.AddrFrame.Offset &lt;&lt; &quot;  &quot;
              &lt;&lt; addressToString( (void*)stackFrame.AddrPC.Offset ) &lt;&lt; &quot;\n&quot;;
       }

       os.flush();
   }
</pre>
<p>The code provided here is specific to the x86 architecture -
stack walking is available for the other Microsoft platforms but
the code to get the stack frame structure set up is slightly
different.</p>
<p>The context record is used to assist with providing a stack
trace in certain 'corner cases'. Note that the stack walking API
may modify this structure and so for a general solution you might
take a copy of the supplied context record.</p>
<p>The stack walking API has a couple of problems. Firstly, it
quite often fails to complete the stack walk for EXEs or DLLs
compiled with full optimisation. The presence of the PDB files can
enable the stack walker to continue successfully even in such
cases, but this is not always successful. Secondly, the stack
walker assumes the Intel stack frame layout used by Microsoft
products and may not work with files compiled by tools from other
vendors.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e231" id=
"d0e231"></a>Conclusion</h2>
</div>
<p>I hope that this article enables you to get better access to
symbolic information when diagnosing problems in your code.</p>
<p>Various tools in the Windows programmer's arsenal use the
DbgHelp DLL. Examples are: the debugger 'WinDbg' from the Microsoft
Debugging Tools, the pre-installed tool 'Dr. Watson', and Process
Explorer, from <a href="http://www.sysinternals.com" target=
"_top">www.sysinternals.com</a>.</p>
<p>If you build symbol files for your own binaries, tools like
these can then provide you with additional information with no
additional programming effort.</p>
<p>You can also provide symbolic names for runtime diagnostic
information in a similar manner to these tools with a small amount
of programming effort. I have shown here a basic implementation of
a symbol engine class you can use to map addresses to names or
provide a call stack for the current process.</p>
<p>I intended it to be easy to understand both what the code does
and how it works. This example can be used as a basis for more
complicated solutions, which could also address the following
issues:</p>
<div class="itemizedlist">
<ul type="disc">
<li>
<p>the code is currently not thread-safe since the DbgHelp APIs
require synchronisation.</p>
</li>
<li>
<p>the code only handles the current process, it can be generalised
to cope with other processes. Incidentally this provides a good
example of why the singleton is sometimes described as an
anti-pattern!</p>
</li>
<li>
<p>no use is made of the APIs giving access the local variables in
each stack frame.</p>
</li>
</ul>
</div>
<p>Happy debugging!</p>
</div>
<div class="bibliography">
<div class="titlepage">
<h2><a name="d0e259" id="d0e259"></a>References</h2>
</div>
<div class="bibliomixed"><a name="Orr2004" id="Orr2004"></a>
<p class="bibliomixed">[Orr2004] 'Microsoft Visual C++ and Win32
Structured Exception Handling', <span class="citetitle"><i class=
"citetitle">Overload 64</i></span>, Oct 2004</p>
</div>
<div class="bibliomixed"><a name="DbgHelp" id="DbgHelp"></a>
<p class="bibliomixed">[DbgHelp] <span class="bibliomisc"><a href=
"http://www.microsoft.com/whdc/devtools/debugging/default.mspx"
target=
"_top">http://www.microsoft.com/whdc/devtools/debugging/default.mspx</a></span></p>
</div>
<div class="bibliomixed"><a name="Robbins" id="Robbins"></a>
<p class="bibliomixed">[Robbins] <span class="citetitle"><i class=
"citetitle">Debugging Applications for Microsoft .NET and Microsoft
Windows</i></span>, John Robbins, Microsoft Press</p>
</div>
</div>
</p>
<p><strong>Notes:</strong>&nbsp;</p>
<p><em>More fields may be available via dynamicdata ..</em></p>
</div>
</channel>
</rss>
