    <rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/">
     <channel>
        <title>ACCU  :: Run-time checking for C++</title>
        <link>https://members.accu.org/index.php/articles/1443</link>
        <description>Professionalism in Programming</description>
        <dc:language>en-us</dc:language> 
        <dc:creator>Administrator</dc:creator> 
        <admin:generatorAgent rdf:resource="http://www.xaraya.org" /> 
        <admin:errorReportsTo rdf:resource="mailto:webeditor@accu.org" />
       <sy:updatePeriod>hourly</sy:updatePeriod>
       <sy:updateFrequency>1</sy:updateFrequency>
       <docs>http://backend.userland.com/rss</docs>




<div class="xar-mod-head"><span class="xar-mod-title">Programming Topics + Design of applications and programs + Overload Journal #4 - Feb 1994</span></div>

<table border="0" cellpadding="1" cellspacing="0">
    <tbody>
    <tr>
        <td valign="top">
            Browse in :
       </td>
       <td valign="top">

                                            <a href="https://members.accu.org/index.php/articles/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c13/">Topics</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c65/">Programming</a>
<br />

                                            <a href="https://members.accu.org/index.php/articles/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c13/">Topics</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c67/">Design</a>
<br />

                                            <a href="https://members.accu.org/index.php/articles/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c76/">Journals</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c78/">Overload</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c233/">04</a>
<br />

                                            <a href="https://members.accu.org/index.php/articles/c65-67-233/">Any of these categories</a>

                    -                        <a href="https://members.accu.org/index.php/articles/c65+67+233/">All of these categories</a>
<br />
</td>
   </tr>
   </tbody>
</table>




<div class="xar-error">
   <p>
 <strong>Note:</strong> when you create a new publication type,
the articles module will automatically use the templates
<em>user-display-[publicationtype].xt</em>
and <em>user-summary-[publicationtype].xt</em>.
If those templates do not exist when you try to preview or display a new article,
you'll get this warning :-)  Please place your own templates in themes/<em>yourtheme</em>/modules/articles . The templates will get the extension .xt there. </p>
</div>
<div class="xar-norm xar-standard-box-padding">
   <h1><strong>Title:</strong>&nbsp;Run-time checking for C++</h1>
<p><strong>Author:</strong>&nbsp;</p>
<p>
<strong>Date:</strong> 01 February 1994 08:54:00 +00:00 or Tue, 01 February 1994 08:54:00 +00:00</p>
<p><strong>Summary:</strong>&nbsp;</p>
<p><strong>Body:</strong>&nbsp;<h2>ABSTRACT</h2>
<p>This article describes a C++ compiler for the Extended DOS, Windows
and Windows NT environments with greatly enhanced run-time diagnostics.
Illegal references to memory,&nbsp; such&nbsp; as&nbsp;
references&nbsp; through&nbsp; NULL or unset pointers - which would
normally corrupt a program, - are detected explicitly. A method of
trapping references to unset data is also discussed. The need to cater
for 'real' programs containing at least some 'dirty' code, and even
in-line assembler, is emphasised. It is argued that the ability to
debug C++ programs free from the insidious effects of memory corruption
is vital if the language is to enjoy continued popularity and be
capable of producing robust maintainable programs.</p>
<h2>INTRODUCTION</h2>
<p>Despite the many advantages of the C++ language over its parent C,
both languages suffer a serious flaw. Using either language it is
excessively easy to write programs which inadvertently access memory
outside their data structures. This problem is so acute that many
teachers prefer 'safer' languages such as Pascal. Even professional
programmers waste a great deal of effort chasing bugs associated with
corrupt programs.</p>
<p>As an example of how easy it is to write illegal C code, consider
the following: fragment:</p>
<pre>char&nbsp; x[10]=&quot;1234567890&quot;; <br>int&nbsp; k=strlen(x);</pre>
<p>The value of <span style="font-style: italic; font-weight: bold;">k</span>
is undefined because the terminating NULL character is not contained
inside the array <span style="font-style: italic; font-weight: bold;">x</span>.
The value of <span style="font-style: italic; font-weight: bold;">k</span>
will depend on the bytes of data that follow the array <span
 style="font-style: italic; font-weight: bold;">x</span>. In more
complex contexts such bugs are extremely hard to spot, especially as
the value of <span style="font-style: italic; font-weight: bold;">k</span>
will in all probability vary unpredictably as debugging code is added
to the program in order to try to resolve the problem.</p>
<p>The author knows of at least one self-taught C programmer who
believed that the statement</p>
<pre>char* ptr;</pre>
<p>created a pointer variable <span
 style="font-style: italic; font-weight: bold;">ptr</span> and
initialised it to point at a newly created character variable. Using a
well known MS-DOS compiler, he had programmed for some months without
discovering his error!</p>
<p>Environments such as DOS and Windows compound the problem because
undefined automatic variables acquire values from the stack which may
have been used previously by interrupt code. This can make the program
extremely unpredictable.</p>
<p>It is possible to program in C++ with intelligent objects which
detect or prevent misuse at run-time[<a href="#1">1</a>]. Such objects
can be constructed in a known state, thus avoiding the problems of
unset variables. Array and pointer operations can then be overloaded
and implemented with suitable checks. It would even be possible to
selectively enable such checks depending on a macro setting (say).
However, few programs will actually get written this way, and even when
this approach is adopted, the original class writer will not have the
benefit of this mechanism.</p>
<p>Traditionally programmers have relied on static code analysis using
Lint[<a href="#2">2</a>], or on tools that detect references to memory
that does not belong to the program[<a href="#3">3</a>]. Some attempts
have been made to extend the C language in order to specify
restrictions that the compiler can enforce[<a href="#5">5</a>]. One
notable attempt to provide run-time pointer checks relied on changing
the size of pointers and also made several rather restrictive
assumptions about the code[<a href="#6">6</a>].</p>
<p>A solution to these problems is contained in the<span
 style="font-weight: bold;"> 32-bit Salford Software C++ compiler (SCC)</span>
[<a href="#4">4</a>], which runs under DOS and Windows, and is
currently being ported to Windows NT. This compiler can either compile
a program conventionally, for speed of execution, or can include the
run-time checks which are described in the rest of this paper.
Immediately an error is detected the debugger is entered before program
corruption can begin.</p>
<h2>CHECKING POINTERS AND REFERENCES</h2>
<p>A program which performs any of the following pointer/reference
operations can certainly be classed as illegal:</p>
<ol style="list-style-type: lower-alpha;">
  <li>A reference through a NULL pointer.</li>
  <li>A reference through an uninitialised pointer.</li>
  <li>A reference which does not point to a data object whose address
has been taken in the program (and which could therefore be a
legitimate pointer target).</li>
  <li>A reference to an automatic object which has ceased to exist.</li>
  <li>A reference to an allocated object which has since been deleted.</li>
  <li>A reference which would alter a program constant.</li>
  <li>An attempt to treat a pointer to a function as a pointer to data,
or vice-versa.<br>
  </li>
</ol>
<p>Because programmers are free to manipulate pointers in many ways -
such as casting them to integers and performing arithmetic on the
result - it is difficult to provide a more restrictive set of
conditions without causing some valid programs to fail. The SCC CHECK
mode mechanism handles all the above cases.</p>
<p>Despite the fact that a 'random' memory reference may modify some
other target of a pointer without detection, the above set of tests has
been found to be very effective in practice at detecting programming
errors. One reason for this is that most objects which are targets for
pointers end up allocated on the heap in CHECK mode (see below). This
means that no such object will be contiguous with another (because of
the red-tape required for storage allocation). Programs which contain
pointer references which stray off either end of a data structure (e.g.
using <span style="font-style: italic; font-weight: bold;">strcpy</span>)
will therefore always fail cleanly.</p>
<h2>REFERENCES THROUGH UNINITIALISED AND NULL POINTERS</h2>
<p>All programs which reference through a NULL pointer cause a fault
under SCC, whether compiled in CHECK mode or not. Our representation of
the NULL pointer is 0x80000000 (a negative 32-bit integer) and we use a
2 Gigabyte (31-bit) segment for both code and data. This means that
negative addresses, such as the NULL pointer, are illegal.</p>
<p>This representation of the NULL pointer is ordinarily invisible to
the programmer because the NULL pointer is converted in the case of
casts or in a simple statement such as:</p>
<pre>char* p=0;</pre>
<p>A reference through a NULL pointer can be diagnosed by the debugger
so that the error message &quot;Reference through NULL pointer&quot; can be
produced by the debugger. The same program run with an ordinary
real-mode DOS compiler and even some DOS-extender systems would destroy
part of the system interrupt table and continue executing!</p>
<p>In CHECK mode, uninitialised pointers will contain the bit pattern
0x80808080 (see the following section), which also causes a fault when
used as an address. Again, the debugger will recognise the particular
bit pattern and generate a specific error message - &quot;Reference through
uninitialised pointer&quot;.</p>
<p>Under Windows NT, the address 0x00000000 is illegal. The NULL
pointer will therefore be given its conventional value of zero in this
environment, and the exception associated with a reference through a
zero address will be interpreted appropriately.</p>
<h2>INITIALISING UNSET VARIABLES</h2>
<p>When a function is entered, all its automatic variables are
initially unset. Since in general the initial value of such variables
will depend on the previous user of the stack (i.e. typically the
previous function call), the result of using an uninitialised automatic
variable can be very unpredictable. In order to fault the use of
uninitialised pointers in CHECK mode, code is planted at function entry
to set the entire stack area to 0x80 in every byte. This produces
illegal pointers and a large negative number in signed integer
variables.</p>
<p>In order to make this mechanism more powerful, at the expense of
faulting some valid programs, three compiler options are provided:</p>
<ol>
  <li>An option is provided to force unset static and public data to
the unset bit pattern, rather than 0/NULL, as the ANSI C standard
requires. Since it is generally considered to be bad practice to rely
on such default values, it is anticipated that many users will use this
option.</li>
  <li>An option (UNDEF) is provided to check the value of every
non-character variable before use to ensure that it does not contain
the unset value. This check will&nbsp;&nbsp; obviously&nbsp;&nbsp;
fail&nbsp;&nbsp; programs&nbsp;&nbsp; which&nbsp;&nbsp; manipulate
numbers corresponding to the unset bit pattern. In practice, because
the default integer size in SCC is 32 bits (so that the probability of
a 'random' integer having the value 0x80808080 is very low) this check
is useful for the vast majority of programs.</li>
  <li>An option is provided to extend the UNDEF check to include
character variables. Clearly, this option is not&nbsp;&nbsp;
appropriate&nbsp;&nbsp; for&nbsp;&nbsp; programs which perform
arithmetic in character variables, but is very useful otherwise,&nbsp;
since&nbsp; 0x80&nbsp; is&nbsp; not&nbsp; a&nbsp; frequently&nbsp; used
character.<br>
  </li>
</ol>
<p>Unfortunately, programs which use uninitialised bit-packed data will
not be faulted by the above scheme (although the result will become
predictable).</p>
<h2>GENERAL POINTER MISUSE - THE STORAGE ACCESSIBILITY INDEX</h2>
<p>In order to identify illegal pointer references of types c to g
(above) it is necessary to keep track of every piece of store which can
legally be the target of a pointer reference. This includes all global
or local constants and variables whose address is taken implicitly or
explicitly. Other variables cannot be validly referenced by a pointer.
This information is recorded in a table known as the Storage
Accessibility Index (SAI). Every reference through a pointer is
compiled into code which includes a system call to check that a
suitable SAI entry exists. Operations which acquire store from the
system, such as <span style="font-style: italic; font-weight: bold;">new</span>,
add an entry in the SAI which is removed when the storage is freed.</p>
<p>The SAI also records if storage is read-only. This enables the
system to diagnose programs containing mistakes such as:</p>
<pre>char* p=&quot;Test&quot;; <br>*p=0;</pre>
<p>Data is considered read-only if it is ultimately derived from a
constant or from some system object which is meant to be read-only
(such as the arguments supplied to the function <span
 style="font-weight: bold;">main</span>). The <span
 style="font-style: italic; font-weight: bold;">const </span>qualifier
is not used for this purpose because the user is permitted to override
the effect of <span style="font-weight: bold; font-style: italic;">const
</span>using a cast. The type data associated with SAI entries is not
recorded (because casts make such information unreliable), however SAI
entries associated with function pointers are distinguished, so that a
function pointer cannot be cast to a data pointer and used to corrupt
the code of the function. Likewise, a pointer to data cannot be cast to
a function pointer and be subjected to a call (this would normally send
a program totally out of control).</p>
<p>Most of the cost of checking C/C++ code reduces to maintaining the
SAI and checking pointer references against it. Typically the use of
CHECK will increase execution times by a factor of 10. This cost is
acceptable in a debugging context, and can be eliminated automatically
by recompilation.</p>
<h2>C++ REFERENCES</h2>
<p>In general, the use of a reference variable can cause program
corruption in the same way as a reference through a pointer. Thus, for
example, it is possible to initialise reference variables to objects
which disappear before the reference is used:</p>
<pre>char* p=new(char[10]); <br>char&amp; q=*(p+5); <br>delete p; <br>q=0;</pre>
<p>Furthermore, reference members of classes can become corrupted
because currently the class object is entered into the SAI as a whole.
For these reasons, accesses to reference variables are also protected
by the SAI.</p>
<h2>DANGLING POINTERS</h2>
<p>A common mistake is to use a pointer to an object which is no longer
available. The simplest example of this involves code such as:</p>
<pre>char* p=new(char[10]); <br>delete p;<br>...<br>if(*p==0)foo();</pre>
<p>Usually the delete operation and the subsequent pointer reference
are well separated in the code, making this type of bug particularly
insidious. If the SAI were implemented exactly as described above, many
such faults might never be diagnosed, because the freed storage might
be reallocated by a subsequent <span
 style="font-weight: bold; font-style: italic;">new</span> (or <span
 style="font-style: italic; font-weight: bold;">malloc</span>) to an
entirely different object which would then become corrupted.</p>
<p>As a partial solution to this problem, storage released by <span
 style="font-style: italic; font-weight: bold;">delete</span> (or <span
 style="font-style: italic; font-weight: bold;">free</span> etc.) is
kept in a special table of obsolete objects rather than being returned
to the memory pool. Only when memory becomes short is the store
returned to the pool on a first in first out basis.</p>
<p>Any reference to an obsolete object can therefore be diagnosed
explicitly as a reference through a dangling pointer. Furthermore, the
more memory available on the machine (SCC runs in 32-bit mode, so there
is no 640K limit) the less danger there is of a piece of freed storage
being reused before a spurious pointer reference.</p>
<p>The common mistake of corrupting the C heap storage by freeing a
region of store more than once is a special case of a dangling pointer
reference, and is detected at the line where it occurs.</p>
<p>An option is available to force the run-time system never to release
dangling storage, thus ensuring that every dangling reference is
caught. Since storage is never re-used in this mode, this is only
useful for testing small examples, or where very large amounts of
memory are available.</p>
<h2>DANGLING REFERENCES TO STACK VARIABLES</h2>
<p>The following function illustrates a slightly different form of
dangling reference:</p>
<pre>char* func() <br>{<br>  char buf[10];<br>  ...<br>  return buf; <br>}</pre>
<p>This function will return a pointer to an object which has ceased to
exist. Because the storage lies on the stack it will be immediately
reused by the next function to be invoked -leading to total confusion.
The example above would produce a compiler warning, but this is not
possible in more complex cases. In general, a pointer to an automatic
(stack) variable may be assigned to a global variable, or even placed
inside a structure, and then referenced after the automatic storage has
been reclaimed.</p>
<p>Because variable <span
 style="font-style: italic; font-weight: bold;">buf</span> is not a
heap variable, it would appear at first sight that such a dangling
pointer reference could not be detected by the CHECK mechanism. The
solution to this problem is that variables which would ordinarily be
allocated on the stack (in non-CHECK mode), and whose addresses are
taken (and which must therefore be placed in the SAI) are treated
specially. These variables are allocated on the heap and placed in the
SAI on function entry (more generally on entry to the scope in which
they are defined). They are deallocated on exit from the function and
are added to the table of dangling references.</p>
<p>Since the arguments of a C function are copies (i.e. passed by
value), the entire argument strip is placed on the heap in CHECK mode
so that pointers to arguments also become dangling after the function
exits.</p>
<p>Operations which unwind the stack, such as the C++
exception-handling mechanism, or the traditional C <span
 style="font-style: italic; font-weight: bold;">setjmp</span>/<span
 style="font-style: italic; font-weight: bold;">longjmp</span> must
take special actions in CHECK mode to deallocate automatic variables
placed on the heap when the stack is unwound. The problem is similar to
that of ensuring that destructors for automatic objects are executed as
the stack is unwound, and is implemented in the same way.</p>
<h2>MIXING CHECKED AND UNCHECKED CODE</h2>
<p>SCC's sister compiler, FTN77, implements similar checks for the
FORTRAN 77 language. The FTN77 run-time system goes to some trouble to
ensure that checked and unchecked code can be mixed in one program.
This facility was considered essential because FORTRAN programs often
make use of scientific libraries, which would usually be supplied
precompiled.</p>
<p>Unfortunately, it is easy to see that no such mixing is possible
with C++ checked code because of the existence of pointers in the
language. For example, an unchecked portion of code could return a
pointer to a piece of store which would not be present in the SAI (even
worse, the unchecked code might communicate the pointer via a global
variable). A reference to such a pointer would generate a spurious
diagnostic.</p>
<p>To enforce the restriction that a whole C program must be compiled
in CHECK mode, we append __C to all names sent to the linker. This has
a very beneficial side effect - a library may contain checked and
unchecked versions of the same code and the linker will extract the
right copy automatically.</p>
<p>Functions compiled with the UNDEF option to check for the use of
undefined data can be freely mixed with functions compiled with the
CHECK option.</p>
<h2>THE SYSTEM LIBRARY</h2>
<p>At first sight it might be thought that the system library need only
be compiled in CHECK mode to be suitable for linking into CHECK mode
programs. However, if that approach were taken, program errors which
manifested themselves inside library routines would produce very
obscure diagnostics. Furthermore, some routines, such as <span
 style="font-style: italic; font-weight: bold;">new</span> and <span
 style="font-style: italic; font-weight: bold;">delete</span>, must be
implemented specially for CHECK mode, since they must adjust the
contents of the SAL.</p>
<p>In order to provide high quality diagnostics for faults manifesting
themselves inside library functions (and to avoid substantial
performance costs), CHECK versions of library routines perform the
necessary checks on their arguments before passing control to the
normal function to perform the bulk of the work. Sometimes checks are
performed after the usual routine has executed.</p>
<p>As an example, consider the <span
 style="font-style: italic; font-weight: bold;">strlen</span> function
in string.h, which calculates the length of a string of characters
terminated by a null byte. The CHECK version of <span
 style="font-style: italic; font-weight: bold;">strlen</span> checks
that its pointer argument is a legal address (a negative address would
cause a protection fault inside <span
 style="font-style: italic; font-weight: bold;">strlen</span>) and
passes it to the standard <span
 style="font-style: italic; font-weight: bold;">strlen</span> routine.
When the length result is returned a check is performed to ensure that
the pointer argument points to an object in the SAI, and that the
string (including the terminating NULL) is completely contained within
the same object. A few functions, such as <span
 style="font-style: italic; font-weight: bold;">memcmp</span> are
implemented specially in order to detect comparisons which run off the
end of a valid data blocks.</p>
<p>Some traditional C library functions, such as <span
 style="font-style: italic; font-weight: bold;">printf</span> and <span
 style="font-style: italic; font-weight: bold;">scanf</span>, take a
variable number of arguments depending on one of the arguments (e.g. a
format string). Mistakes in calls to these functions are very common.</p>
<p>As explained above, a function's arguments are contained in a single
SAI entry. It is therefore easy for functions such as <span
 style="font-style: italic; font-weight: bold;">printf</span> to report
if too few arguments are supplied. Unfortunately no check is possible
that the arguments are indeed of the required type (although SCC
'knows' about the standard C functions which take format arguments, and
will warn at compile time if there is a format/data mismatch).</p>
<p>A further problem with the traditional C system library, is that
most error conditions (for example a negative argument to <span
 style="font-style: italic; font-weight: bold;">sqrt</span>) are
handled by setting the global variable <span
 style="font-style: italic; font-weight: bold;">errno</span> and
returning. Programs which do not check <span
 style="font-style: italic; font-weight: bold;">errno</span> after
every library function call which could set it (i.e. almost all
programs) can be hard to debug. Other languages, such as FORTRAN,
either permit, or require the implementation to fault on such errors.
SCC provides a function which may be called to activate certain classes
of error, such as arithmetic domain errors, so that the program fails
after such an error and enters the debugger.</p>
<p>It is possible for users to set up (hopefully well tested) libraries
to operate in this way. This may be very valuable for general purpose
class libraries.</p>
<h2>C++ CLASSES</h2>
<p>In general, C++ classes contain data which the user should never
alter. Pointers to the virtual function table, and pointers to virtual
base classes are set by the system only. Other objects, such as
reference member variables, should not change their values after
initialisation. Currently the CHECK mechanism does not address these
issues, except that it checks that the virtual function pointer points
to a valid virtual function table before using it.</p>
<p>In principle, the SAI entry for such a class could be split into
several entries covering each user-accessible region, or some other
scheme to restrict access to system parts of the object could be
implemented.</p>
<h2>DIFFICULT PROGRAMS</h2>
<p>By overloading the <span
 style="font-style: italic; font-weight: bold;">new</span> operator, or
by the use of casts, the C++ programmer can take over the allocation of
storage for his data structures. Typically, he may allocate storage
from a single large region of memory obtained from the system. Since
this large storage region will appear in the SAI, such programs will
work in CHECK mode, however the checks will be much less effective.
Because of this, a macro (__CHECK__) is set to inform the user that
CHECK mode is in operation. The user may then revert to the standard
storage allocators in CHECK mode, or even manipulate the SAI directly
via system function calls. Since in general it is important that a
program executes in the same way in CHECK mode as it will in normal
use, wise users will use this facility sparingly!</p>
<p>SCC can handle inline 32-bit Intel assembler. At first sight this
poses serious problems in check mode. In particular, automatic objects
that are moved to the heap (see above) require an extra level of
indirection in order to access them. Assembler instructions which
reference such variables directly are therefore faulted in CHECK mode.
For example:</p>
<pre>void foo(int&amp;);<br>int k;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; //Automatic variable k<br>foo(k);     //This takes the address of 'k' <br>            //and forces SCC to allocate it <br>            //on the heap in CHECK mode<br>asm{<br>    mov&nbsp;&nbsp; eax,k; //Illegal in CHECK mode</pre>
<p>A pseudo instruction to load the address of a variable (LAD) is
supplied in order to write assembler code that will work in CHECK MODE.
For example:</p>
<pre>    lad&nbsp;&nbsp; &nbsp;eax,k;</pre>
<p>will load the address of <span
 style="font-style: italic; font-weight: bold;">k</span> into register
EAX. This assembles as an LEA instruction, which loads the address of
its operand, in normal mode, or if <span
 style="font-style: italic; font-weight: bold;">k</span> does not
require allocation on the heap, and as a MOV instruction accessing the
pointer to <span style="font-style: italic; font-weight: bold;">k</span>
if <span style="font-style: italic; font-weight: bold;">k</span> is
allocated on the heap.</p>
<p>Pseudo instructions are also supplied to check the contents of a
register to ensure that they point to a suitable SAI entry. These
pseudo instructions assemble into code which preserves all registers
over the check. In non-check mode these pseudo instructions generate no
code.</p>
<p>CHECK mode does not, of course, solve the problems caused by
incorrect assembler code, which may still corrupt the system.</p>
<h2>CONSTRAINTS ON FEASIBLE CHECKING SCHEMES</h2>
<p>In designing the above checking mechanism considerable care was
taken to avoid constructions which would have had unfortunate
side-effects. In particular, it was decided that:</p>
<ol>
  <li>Any scheme which increased the size of pointers was likely to
break code which assumed the size of pointers to be 4-bytes.</li>
  <li>Since&nbsp; pointers&nbsp; may&nbsp; be&nbsp; cast&nbsp; to&nbsp;
4-byte&nbsp; (default) integers in normal mode, it was essential that
this was possible in CHECK mode. (The standard allows implementations
to fail such conversions if there are insufficient bits in the integer
to hold a pointer, but real&nbsp;&nbsp; programs&nbsp;&nbsp;
impose&nbsp;&nbsp; stronger&nbsp;&nbsp; constraints&nbsp;&nbsp; on
implementations).</li>
  <li>Structures, classes, and unions should retain the same size when
compiled in CHECK mode.</li>
  <li>Because unions can contain pointers overlaid on other objects, it
was decided that any scheme that tracked the contents of pointer
variables (as opposed to the data to which they could point)&nbsp; was
impractical.<br>
  </li>
</ol>
<p>Because of the above constraints it was decided that any attempt to
detect uninitialised data using additional bits was impractical.
Furthermore, it was decided that any attempt to validate a pointer
reference using the past history of the pointer was impractical. For
example:</p>
<pre>int k=100;<br>char* p=new(char[10]);<br>*(p+k)=0;</pre>
<p>This program is clearly wrong (since the programmer can have no
knowledge about the memory location he is altering), and will fault in
CHECK mode unless the pointer <span
 style="font-style: italic; font-weight: bold;">p</span>+<span
 style="font-style: italic; font-weight: bold;">k</span> happens to
point to a read-write pointer target. Only by tracking the history of
the pointer <span style="font-style: italic; font-weight: bold;">p</span>
would it be possible to fault this program in all cases.</p>
<h2>CHECKING FUNCTION INTERFACES</h2>
<p>As with FORTRAN 77, traditional C programs which do not use function
prototypes can become corrupt if a function is called with the wrong
number or types of arguments, or if the return type does not match.
FTN77 provides a comprehensive check to diagnose such problems at
run-time in FORTRAN code. However, since SCC can enforce the use of
function prototypes with C code, and these are in any case mandatory
with C++ code, we have made the decision not to provide an independent
run-time check for argument mismatches.</p>
<h2>ARITHMETIC CHECKS</h2>
<p>Floating point errors, such as overflow, can be detected by the
hardware without special action by the compiler. In order to detect
integer overflow, the compiler would have to plant extra instructions
to check the condition codes. Unfortunately, many C/C++ programmers
rely on the twos-complement implementation of signed integer
arithmetic, and perform operations which are strictly illegal. For
example, many programmers would be surprised if the left shift operator
('<span style="font-weight: bold;">&lt;&lt;</span>') caused an overflow
when it forced a change of sign of a signed integer. Also, many
programmers mix signed and unsigned data in ways which technically
cause overflow. Since integer overflow is a much less serious problem
in a 32 bit environment, it was decided not to attempt to detect this
problem.</p>
<h2>PERFORMANCE</h2>
<p>The performance consequences of using CHECK mode vary widely from to
program. On average, we find that a reduction of speed by a factor of
10 is common. CHECK mode programs also use much more memory. Since it
is anticipated that CHECK mode will only be used as a debugging tool,
this loss of performance should be acceptable except for those programs
with a real-time component. In principle, some of the overhead of
constantly checking accesses to the SAI could be compiled away to
improve performance in CHECK mode. However, since some sources of
program corruption - such as incorrect assembler code - may invalidate
the assumptions which the compiler might use to make such
optimisations, it is not clear whether this would be worthwhile.</p>
<h2>CONCLUSION</h2>
<p>It is a sobering fact that almost every large FORTRAN or C program
contains at least a few undetected instances of memory corruption.
These problems can be almost eliminated at debug-time by the use of
suitable compilation techniques.</p>
<h2>ACKNOWLEDGEMENTS</h2>
<p>I would like to acknowledge the many helpful discussions with my
colleagues Ewan Cunningham, Tim Bartle, Mark Stevens, Robert Chafer,
Keng Low, Dinesh Patel, and Tony Webster, which have contributed to
this paper and to the software which it describes.</p>
<h2>REFERENCES</h2>
<a name="1"></a>1. B. Stroustrup, &quot;The Evolution of C++ : 1985 to
1989&quot;, Computing Systems, Vol 2,(3), 191-250 (1989).<br>
<a name="2"></a>2. PC-lint Reference Manual, Gimpel Software. <br>
<a name="3"></a>3. Nu-Mega Bounds Checker, Nu-Mega Technologies Inc. <br>
<a name="4"></a>4. Salford C/386 Reference Manual, Salford Software.<br>
<a name="5"></a>5. D.W. Flater, Y. Yesha, and E.K. Park, &quot;Extensions to
the C Programming Language for enhanced fault detection&quot;, Software
Practice and Experience, Vol 23 (6), 617-628 (1993).<br>
<a name="6"></a>6. J.L.&nbsp; Steffen, &quot;Adding&nbsp; Run-time&nbsp;
checking&nbsp; to&nbsp;&nbsp; the&nbsp; Portable&nbsp; C Compiler&quot;,
Software Practice and Experience, Vol 22(4),305-316 (1992).<br>
</p>
<p><strong>Notes:</strong>&nbsp;</p>
<p><em>More fields may be available via dynamicdata ..</em></p>
</div>
</channel>
</rss>
