Journal Articles

Overload Journal #109 - June 2012 + Programming Topics
Browse in : All > Journals > Overload > o109 (7)
All > Topics > Programming (877)
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: Valgrind Part 2 – Basic memcheck

Author: Martin Moene

Date: 01 June 2012 18:22:10 +01:00 or Fri, 01 June 2012 18:22:10 +01:00

Summary: Learning how to use our tools well is a vital skill. Paul Floyd shows us how to check for memory problems.

Body: 

In the first part of this series I explained what Valgrind is. In this article, I’ll start explaining how to use it. Memcheck is the best known of the Valgrind tools. It is a runtime memory checker that validates your use of heap memory and (to a lesser extent) stack memory.

Memcheck detects the following kinds of errors

  1. Illegal read/write
  2. Use of uninitialized memory
  3. Invalid system call parameters
  4. Illegal frees
  5. Source/destination overlap
  6. Memory leaks

Some other memory checking tools have snappier TLA names for the errors that they detect. Some people would like to be able to prioritize the types of errors. I’d say that in general all of these errors can cause either incorrect operation of an application or crashes. Generally the 1st and 4th items on the list are the most likely causes of crashes, but don’t take that as advice to neglect the other four types.

General advice

Don’t overdo the options. The default options are good for most situations. Some of the options will add significantly to the already high overhead. If you discover a fault and the default output is not enough for you to pin down the error, then consider adding more options. Personally I use memcheck in two ways. Firstly in automatic regression tests each weekend. All of the results get distilled into a single summary. Secondly ‘interactively’ in a shell, and in this mode I tend to turn up the options.

All of the examples that follow use trivial examples. In real world defects, the locations of the fault, the declaration, the allocation, the initialization and the free may all be far apart.

Illegal read/write errors

Illegal read/write errors correspond to reads or writes to addresses that do not belong to any valid address.

The example in Listing 1 shows reading beyond the end of an array.

// abrw.cpp
#include <iostream>

void f(int *p2)
{
  int i1 = p2[10]; // write beyond the end of p1
  std::cout << "Hello\n";
}

int main()
{
  int *p1 = new int[10];
  f(p1);
  delete [] p1;
}
			
Listing 1

Compiling this and running it under memcheck will generate the output shown in Figure 1.

==85258== Invalid read of size 4
==85258== at 0x400AA4: f(int*) (abrw.cpp:5)
==85258== by 0x400B5E: main (abrw.cpp:12)
==85258== Address 0x1c90068 is 0 bytes after a block of size 40 alloc'd
==85258== at 0x1006BB7: operator new[](unsigned long) (in /usr/local/lib/valgrind/vgpreload_memcheck-amd64-freebsd.so)
==85258== by 0x400B51: main (abrw.cpp:11)
			
Figure 1

If that had been a long long or a long on a 64 bit platform, then it would have been an Invalid read of size 8.

Use of uninitialized memory

You’ll get this sort of error if you read memory before assigning any value to it. For instance, if you malloc an array then read an element from it. Valgrind also propagates the state of initialization through assignments and will only trigger an error if the execution outcome could be affected by the uninitialized state of the memory. This means that harmless errors do not generate any messages (good news) but also it means that the site where memcheck says the error occurs could be far from where the uninitialized memory was allocated.

Listing 2 shows an example of this. I’ve deliberately made the error propagate through three variables in function f() to illustrate that no error is generated until the if() condition is reached.

// uninit.cpp
#include <iostream>

void f(long *p2)
{
  long l1 = p2[10]; // read beyond end of p1

  long l2 = l1;     // propagates

  long l3 = l2;     // propagate again

  if (l3)           // uninitialized read

  {
    std::cout << "Hello\n";
  }
}

int main()
{
  long *p1 = new long[10];
  f(p1);
  delete [] p1;
}
			
Listing 2

This will result in the output shown in Figure 2.

==93289== Invalid read of size 8
==93289== at 0x400AA4: f(long*) (uninit.cpp:5)
==93289== by 0x400B7E: main (uninit.cpp:17)
==93289== Address 0x1c90090 is 0 bytes after a block of size 80 alloc'd
==93289== at 0x1006BB7: operator new[](unsigned long) (in /usr/local/lib/valgrind/vgpreload_memcheck-amd64-freebsd.so)
==93289== by 0x400B71: main (uninit.cpp:16)
			
Figure 2

If the error is in a stack variable rather than in a heap variable, you get a bit less information (see Listing 3).

// uninit2.cpp
#include <iostream>

void f(long l)
{
  long lb = l;
  long lc = lb;
  long ld = lc;
  if (lc)
  {
    std::cout << "Hello\n";
  }
}

int main()
{
  long la; // uninitialized local scalar

  f(la);
}
			
Listing 3

This gives just the output in Figure 3a.

==4164== Conditional jump or move depends on uninitialised value(s)
==4164==    at 0x4009E9: f(long) (uninit2.cpp:8)
==4164==    by 0x400A10: main (uninit2.cpp:17)    by 0x400B71: main (uninit.cpp:16)
			
Figure 3a

Use --memcheck:track- origins=yes for more info, but this will increase the Valgrind overhead. Adding this option gives the output in Figure 3b.

==4455== Conditional jump or move depends on uninitialised value(s)
==4455==    at 0x4009E9: f(long) (uninit2.cpp:8)
==4455==    by 0x400A10: main (uninit2.cpp:17)
==4455==  Uninitialised value was created by a stack allocation
==4455==    at 0x400A00: main (uninit2.cpp:14)
			
Figure 3b

OK, so it narrows the search down to main(), but it doesn’t tell us the name of the variable or the line (the file and line numbers in the output are where teh functions start, not where the problem is).

Invalid system call parameters

Listing 4 is a std::fwrite of memory that is not initialized.

// syscall.cpp
#include <cstdio>

const std::size_t intArraySize = 3;

int main()
{
   std::FILE *f = std::fopen("output.dat", "w");
   if (f)
   {
      int *intArray = new int[intArraySize];
      std::size_t bytesWritten = 0U;
      intArray[0] = 1;
      // intArray[1] not initialized
      intArray[2] = 3;
      bytesWritten = std::fwrite(intArray,
         sizeof(int), intArraySize, f);
      // omit check
      std::fclose(f);
      delete [] intArray;
   }
}
			
Listing 4

This will generate the output shown in Figure 4a.

==468== Syscall param write(buf) points to uninitialised byte(s)
==468==    at 0x148C82: write$NOCANCEL (in /usr/lib/libSystem.B.dylib)
==468==    by 0x148BFC: _swrite (in /usr/lib/libSystem.B.dylib)
==468==    by 0x148B41: __sflush (in /usr/lib/libSystem.B.dylib)
==468==    by 0x14859A: fclose (in /usr/lib/libSystem.B.dylib)
==468==    by 0x100000EB6: main (syscall.cpp:16)
==468==  Address 0x100004134 is 4 bytes inside a block of size 4,096 alloc'd
==468==    at 0xD6D9: malloc (vg_replace_malloc.c:266)
==468==    by 0x1489ED: __smakebuf (in /usr/lib/libSystem.B.dylib)
==468==    by 0x148959: __swsetup (in /usr/lib/libSystem.B.dylib)
==468==    by 0x10ABC8: __sfvwrite (in /usr/lib/libSystem.B.dylib)
==468==    by 0x15C3C4: fwrite (in /usr/lib/libSystem.B.dylib)
==468==    by 0x100000EA9: main (syscall.cpp:14)
			
Figure 4a

Look carefully at the log in Figure 4a and you will see that the error occurs when the file is closed, not when the call to std::fwrite is performed. This is because the output is cached. And this can be quite pernicious. If I add a call to std::setvbuf(f, 0, _IONBF, 0); after the std::fopen, then the log that I get as shown in Figure 4b.

==534== Syscall param write(buf) points to uninitialised byte(s)
==534==    at 0x148C82: write$NOCANCEL (in /usr/lib/libSystem.B.dylib)
==534==    by 0x148BFC: _swrite (in /usr/lib/libSystem.B.dylib)
==534==    by 0x10AC16: __sfvwrite (in /usr/lib/libSystem.B.dylib)
==534==    by 0x15C3C4: fwrite (in /usr/lib/libSystem.B.dylib)
==534==    by 0x100000E97: main (syscall.cpp:15)
==534==  Address 0x1000040e4 is 4 bytes inside a block of size 12 alloc'd
==534==    at 0xD6D9: malloc (vg_replace_malloc.c:266)
==534==    by 0x64F04: operator new(unsigned long) (in /usr/lib/libstdc++.6.0.9.dylib)
==534==    by 0x64F96: operator new[](unsigned long) (in /usr/lib/libstdc++.6.0.9.dylib)
==534==    by 0x100000E5E: main (syscall.cpp:11)
			
Figure 4b

With an unbuffered stream, you see the error immediately rather than when the buffer is flushed.

Illegal frees

An example of this is freeing stack memory (Listing 5). This one is a bit of a no-brainer, the compiler complains about the code and I get a nice core dump if I run the application.

// ifree.cpp
void func()
{
  int stackArray[10];
  delete stackArray; // not even array delete

}

int main()
{
  func();
}
			
Listing 5

The corresponding output is in Figure 5.

==72595== Invalid free() / delete / delete[]
==72595==    at 0x1004DDC: operator delete(void*) (in /usr/local/lib/valgrind/vgpreload_memcheck-amd64-freebsd.so)
==72595==    by 0x400680: func() (ifree.cpp:4)
==72595==    by 0x400698: main (ifree.cpp:9)
==72595==  Address 0x7ff000240 is on thread 1's stack
			
Figure 5

Let’s try a somewhat more likely error, using the wrong delete (see Listing 6).

// ifree2.cpp
void func()
{
  int *heapArray = new int[10];
  delete heapArray; // not even array delete

}

int main()
{
  func();
}
			
Listing 6

The corresponding output is in Figure 6. Here, memcheck correctly identified that there was an incorrect delete, but it doesn’t go as far as saying that the memory was allocated with array new but deleted with scalar delete.

==72950== Mismatched free() / delete / delete []
==72950==    at 0x1004DDC: operator delete(void*) (in /usr/local/lib/valgrind/vgpreload_memcheck-amd64-freebsd.so)
==72950==    by 0x4006DE: func() (ifree2.cpp:4)
==72950==    by 0x4006F8: main (ifree2.cpp:9)
==72950==  Address 0x1c8f040 is 0 bytes inside a block of size 40 alloc'd
==72950==    at 0x1005BB7: operator new[](unsigned long) (in /usr/local/lib/valgrind/vgpreload_memcheck-amd64-freebsd.so)
==72950==    by 0x4006D1: func() (ifree2.cpp:3)
==72950==    by 0x4006F8: main (ifree2.cpp:9)
			
Figure 6

Source/destination overlap

The usual example of this is a std::strcpy where the source and destination point within the same char array (Listing 7).

// overlap.cpp
#include <cstring>
#include <iostream>

int main()
{
   char *str = new char[100];
   std::sprintf(str, "Hello, world!");
   std::strcpy(str, str+2);
   std::cout << "str " << str << "\n";
   delete [] str;
}
			
Listing 7

Valgrind’s output is shown in Figure 7.

==74324== Source and destination overlap in strcpy(0x1c90040, 0x1c90042)
==74324==    at 0x1009A61: strcpy (in /usr/local/lib/valgrind/vgpreload_memcheck-amd64-freebsd.so)
==74324==    by 0x400BE9: main (overlap.cpp:8)
			
Figure 7

The standard solution to this sort of problem is to use std::memmove instead of std::strcpy or std::memcpy.

Memory leaks

This is the largest of the memcheck error types. Memcheck can detect 3 different types of ‘leak’. The definite leak, where the pointer has gone out of scope and the memory is leaked. Next there are possible leaks. This is where there are no longer pointers to the start of the allocated memory, but there are still pointers within the allocated memory. Finally there is still-in-use memory, where both the memory and the pointer to it still exist.

If you use a memory manager (e.g., a pool allocator), then this can complicate leak detection. For instance, if your application has a pool allocator that news blocks of 100MBytes, uses an overloaded operator new that uses this pool, optionally does some overloaded deletes, and then when it terminates deletes all of the pool blocks, memcheck won’t be able to detect any leaks, even though your application may be leaking your pool memory in the sense that it wasn’t deleted and made available for reuse before the pool was deleted. Furthermore, if you are using an allocator that allocates blocks that are handled as {length:memory[:guard]}, so that the pointer obtained by new is adjusted after setting the length, then you’re likely to get possible leaks detected rather than definite leaks.

There are two things that you can do in this case. One is to have a special build, where you compile with a macro like -DDEFAULT_NEW which disables the memory allocator and uses the standard allocators. Obviously having two sets of code is not ideal, and this will be a maintenance overhead. The alternative is to include the valgrind.h header and use the Valgrind MEMPOOL macros. More on that in a later article.

A very short example of this in Listing 8.

// leak.cpp
int main()
{
   int *leak = new int(42);
}
			
Listing 8

Valgrind’s output for this is in Figure 8.

==76314== 4 bytes in 1 blocks are definitely lost in loss record 1 of 1
==76314==    at 0x1005F79: operator new(unsigned long) (in /usr/local/lib/valgrind/vgpreload_memcheck-amd64-freebsd.so)
==76314==    by 0x400681: main (leak.cpp:3)
			
Figure 8

Suppressing errors

Memcheck will use a default suppression file that was generated on the machine where Valgrind was built. This will suppress ‘well known’ (and hopefully harmless) errors in libc and X11. You can also use user-defined suppression files with the option:

-- memcheck:suppressions=<suppression file>

This can be used more than once. I would advise that you do this only for harmless errors or errors in third party libraries that you can’t fix. As a rule, you’re better off fixing your errors than hiding them in a suppression file.

You can use --memcheck:gen- suppressions=all to generate suppression stacks in output log file, which look like this

{
   <insert_a_suppression_name_here>
   Memcheck:Leak
   fun:_Znwm
   fun:main
}

The opening and closing braces delimit the error callstack. The first line is intended for use as a comment. I would recommend that you change this and try to make it something unique. If you use valgrind -v, then in the summary, Valgrind will list all of the suppressions that it used with their comments. This can be used to see which of your suppressions are being used, which allows you to clean out your suppressions files from time to time.

The second line gives the type of error.

The third to last lines are the callstack. Each line has one of the following forms

You can use * wildcard to make suppressions more generic. For instance, if you want to use the same suppression files on both 32bit and 64bit Linux, then instead of having two separate suppressions for each platform, one with /opt/mypkg/lib and the other with /opt/mypkg/lib64, you could have just one suppression with /opt/mypkg/lib*.

You may want to reduce the amount of callstack that appears in the suppression. This can reduce the number of suppressions that you need (which is OK if they are all the same issue). Don’t overdo it though, you don’t want to suppress genuine errors.

Errors that memcheck does not detect

Lastly but not least, there are a few types of memory errors that memcheck does not detect.

Reading or writing beyond arrays that are global or on the stack, for instance

  int x[10]; 
  // local, global or static

  x[10] = 1;

Try using exp-sgcheck for this sort of error.

Now that we’ve covered the basics of memcheck, in the next article we’ll look at more advanced techniques.

Notes: 

More fields may be available via dynamicdata ..