Journal Articles

Overload Journal #59 - Feb 2004 + Programming Topics
Browse in : All > Journals > Overload > 59 (7)
All > Topics > Programming (877)
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: C++ as a Safer C

Author: Administrator

Date: 02 February 2004 21:55:34 +00:00 or Mon, 02 February 2004 21:55:34 +00:00

Summary: 

Body: 

There are many features in C++ that can be used to enhance the quality of code written with classic C design even if no object oriented techniques are used. This article describes a technique to protect against value overflow and out-of-bounds access of arrays.

This article started with a discussion about how C projects could use features in C++ to improve the quality of the code without having to do any major redesign.

Bounded Integral Types

The built-in integral types in C and C++ are very crude. They map directly to what can be represented in hardware as bytes and words with or without signs. There is no way to say that a number can only have values in the range 1 to 100. The best you can do is to use an unsigned char which typically has a value range from 0 to 255, but this does not provide any checking for overflow.

It is easy to create an integral type that does the range checking as Pascal and Ada do. The implementation of BoundedInt in listing 1 shows how this can be done with C++ templates. It takes three parameters. The first two specify the inclusive range of allowed values. The third parameter specifies the underlying type to be used and uses a default type given by the BoundedIntTraits class.

#include <cassert>

template <int Lower, int Upper,
          typename INT=typename
          BoundedIntTraits<Lower,Upper>::Type>

class BoundedInt {
public:
  // Default constructor
  BoundedInt()
#ifndef NDEBUG
    : m_initialised(false)
#endif
  {}
  // Conversion constructor
  BoundedInt(int i)
    : m_i(static_cast<INT>(i))
#ifndef NDEBUG
    , m_initialised(true)
#endif
  {
    // Check input value
    assert((Lower<=i) && (i<=Upper));
  }
  // Conversion back to a builtin type
  operator INT() {
  assert(m_initialised);
  return m_i;
}

// Assignment operators
  BoundedInt & operator+=(int rhs) {
    assert(m_initialised);
    // Check for overflow
    assert(m_i/2 + rhs/2 + (m_i&rhs&1)
      <= Upper/2);
    assert(Lower/2
      <= m_i/2 + rhs/2 - ((m_i^rhs)&1));
    // Check result value
    assert((Lower<=m_i+rhs) && (m_i+rhs<=Upper));
    // Perform operation
    m_i += rhs;
    return *this; 
  }

// Increment and decrement operators.
  BoundedInt & operator++() {
    assert(m_initialised);
    // Check for overflow
    assert(m_i < Upper);
    // Perform operation
    ++m_i;
    return *this;
  }

// Other operators ...
private:
  INT m_i;
  #ifndef NDEBUG
  bool m_initialised;
  #endif
};

Listing 1: Definition of BoundedInt. Only the plus operator is shown here. The other arithmetic operators follow the same design.

The BoundedIntTraits class is used to find the smallest built-in type that can hold numbers of the specified range. It uses some meta-programming to figure out which type to use. The implementation of the BoundedIntTraits class is shown in listing 2.

#include <climits>
// Compile time assertion:
template <bool condition>
struct StaticAssert;
template <>
struct StaticAssert<true> {};

// Template for finding the smallest
// built-in type that can hold a given
// value range, based on a set of
// conditions.
template< bool sign, bool negbyte,
          bool negshort, bool negint,
          bool sbyte, bool ubyte,
          bool sshort, bool ushort,
          bool sint>
  struct BoundedIntType;

template<>
struct BoundedIntType< true, true, true,
                       true, true, true,
                       true, true, true> {
  typedef signed char Type;
};

template< bool negbyte, bool sbyte,
          bool ubyte>
struct BoundedIntType< true, negbyte,
                       true, true,
                       sbyte, ubyte,
                       true, true,
                       true> {
  typedef signed short Type;
};

template<bool negbyte, bool negshort,
         bool sbyte, bool ubyte,
         bool sshort, bool ushort>
struct BoundedIntType< true, negbyte,
                       negshort, true,
                       sbyte, ubyte, sshort,
                       ushort, true> {
  typedef signed int Type;
};

template <bool sbyte>
struct BoundedIntType< false, true, true,
                       true, sbyte, true,
                       true, true,
                       true> {
  typedef unsigned char Type;
};

template< bool sbyte, bool ubyte,
bool sshort>
struct BoundedIntType< false, true, true,
                       true, sbyte, ubyte,
                       sshort, true,
                       true> {
  typedef unsigned short Type;
};

template< bool sbyte, bool ubyte,
          bool sshort, bool ushort,
          bool sint>
struct BoundedIntType< false, true, true,
                       true, sbyte, ubyte,
                       sshort, ushort,
                       sint> {
  typedef unsigned int Type;
};

// The traits template provides value
// range information to the
// BoundedIntType to get the smallest
// possible type.
template <int Lower, int Upper>
struct BoundedIntTraits {
  StaticAssert<(Lower <= Upper)> check;
  typedef typename
    BoundedIntType<Lower < 0,
    Lower >= CHAR_MIN,
    Lower >= SHRT_MIN,
    Lower >= INT_MIN,
    Upper <= CHAR_MAX,
    Upper <= UCHAR_MAX,
    Upper <= SHRT_MAX,
    Upper <= USHRT_MAX,
    Upper <= INT_MAX>::Type Type;
};

Listing 2: Definition of BoundedIntTraits. The types long and unsigned long are not included to keep the listing shorter.

The checking is performed here by using the assert() macro. Note that this checking only happens in debug builds and not in the release builds to reduce the overhead for this checking. Using inlining and the assert() macro removes any overhead in optimised release builds. With a good optimiser the resulting code will be identical to when built-in types are used. Alternatives to assert() can of course be used such as throwing an exception or logging a message to a file.

The BoundedInt class is only designed to work with value ranges that fit in an int. To support wider ranges all methods that take an int as a parameter must have overloaded siblings that take a long, or even long long where supported.

The operator+=() member must check that the new value is within the valid range. It also has to check that there is no overflow during addition. The method of detecting overflow is complicated as there is no support for detecting overflow for built-in types in C and C++. The method here scales down all values to manageable sizes in order to do an overflow check. Because of the scaling down, it has to keep track of carry over data from the least significant bits to work properly in edge cases where the value range is close to the value range of the underlying type.

Other arithmetic assignment operators that BoundedInt should support are not shown here as they would take too much space. The design of these operators follows the design for the plus operator.

There are no binary arithmetic operators defined. When a BoundedInt object is used in a binary arithmetic operation, it will be converted to a built-in integral type before the operation. This means that there is no checking of the results of these operations, unless the result is assigned to a BoundedInt object. There is a pitfall here in that overflow cannot be checked for.

BoundedInt<-10, INT_MAX> a = 10;
a += INT_MAX;     // Overflow checked
a = a + INT_MAX;  // Overflow not checked

A default constructor is available in order to mimic the behaviour of built-in types. It does not initialise the value but maintains a flag to indicate that this object does not have a defined value. This flag is checked by member functions that access or modify the value. The m_initialised member flag is surrounded by conditional pre-processing directives to avoid overhead in release builds.

The copy constructor and copy assignment operators are not defined as the compiler generated versions are appropriate.

Below are some examples from an imaginary C project implementing a lift control with a single change to use BoundedInt:

typedef BoundedInt<-4, 17> FloorNumber;
FloorNumber liftPosition = 0;
const FloorNumber myOfficeFloor = 10;

/* go up */
++liftPosition;

/* go up fast */
liftPosition += 4;
printf("The lift is %d floors away.\n",
abs(liftPosition-myOfficeFloor));

BoundedInt objects can appear in any arbitrarily complex expression thanks to the conversion operator. Because the conversion operator is inlined the BoundedInt object will generate exactly the same code as when using a built-in type.

Bounded Arrays

A BoundedInt object can be used as a bounds checked index into arrays. Example:

const int SixPackSize = 6;
Bottle myBeers[SixPackSize];
BoundedInt<0, SixPackSize-1> ix;
for( ix = 0 ; ix < SixPackSize ; ++ix ) {
  drink(myBeers[ix]);
}

If ix for some reason is changed to an invalid value, the BoundedInt class will warn about this.

We can take this one step further by creating a class that only allows element access using numbers within the allowed range.

template <typename T, size_t Size>
class BoundedArray {
public:
  T& operator[](BoundedInt<0,
                Size-1> ix) {
    return m_data[ix];
  }
public:
  T m_data[Size];
};

Note that the member data is public to allow aggregate initialisation. See how this is used below. The member data can be made public without risk for misuse as the data is equally accessible through the index operator as with direct access.

Whenever an element is requested using an index of any builtin integral type, that index is converted to a BoundedInt which checks that its value is within the acceptable range.

This template takes two parameters, the type of the elements in the array and a non-type template parameter to indicate the size of the array. The simple example above will work as before with only a small change to the definition of myBeers.

BoundedArray<Bottle, SixPackSize>
  myBeers;

This array can be initialised in the same way as a built-in array:

BoundedArray<Bottle, SixPackSize>
  myBeers = { ... };

There is no overhead in release builds for this array class. The index operator is inlined and there is no indirect pointer access to the underlying array. Having the size as a template parameter may look like we are causing code bloat if several arrays of different sizes are used. Yes, there will be several instantiations but because all functions are inlined and optimised away there is no extra code that can multiply.

Bounded Pointers

In the same way as for using checked array indices we can create a smart pointer class that makes sure that it points to an element inside the array. It will have to know the base address of the array and the size to do the checking. This information is retrieved from the array class when a pointer is created.

The starting point is an example with built-in pointers:

Bottle* p = myBeers;
for( ; p->size != 0 ; ++p ) {
  drink(*p);
}

myBeers is an array where the last elements members are cleared as a termination condition. We replace the built-in pointer p with a smart pointer:

BoundedPointer<Bottle>
  p = myBeers;

The loop in the example above remains unchanged.

The definition of BoundedPointer is shown in listing 3. The array base address, array size and the initialised flag are kept as members only for debug builds to perform the runtime checks. To avoid this overhead in release builds the m_base, m_size and m_initialised members are surrounded with conditional preprocessing directives.

#include <cstddef>
#include <cassert>

template <typename T>
class BoundedPointer {
public:
  // Default constructor
  BoundedPointer()
#ifndef NDEBUG
    : m_initialised(false)
#endif
  {}
// Constructor from a built-in array
  template <size_t Size>
  BoundedPointer(T (&arr)[Size])
    : m_p(arr)
#ifndef NDEBUG
    , m_base(arr), m_size(Size)
    , m_initialised(true)
#endif
  {}
// Constructor from a user defined array
  BoundedPointer(const T* base, size_t size)
    : m_p(const_cast<T*>(base))
#ifndef NDEBUG
    , m_base(m_p)
    , m_size(size)
    , m_initialised(true)
#endif
  {}
// Constructor from null
  BoundedPointer(void * value)
    : m_p(static_cast<T *>(value))
#ifndef NDEBUG
    , m_base(m_p), m_size(1)
    , m_initialised(true)
#endif
  {}
// Dereference operators
  T & operator*() {
    assert(m_initialised);
    assert(m_p != 0);
    return *m_p;
  }
  T * operator->() {
    assert(m_initialised);
    assert(m_p != 0);
    return m_p;
  }
  T & operator[](size_t ix) {
    assert(m_initialised);
    assert(m_p != 0);
    assert(m_p + ix < m_base + m_size);
    return m_p[ix];
  }
// Pointer arithmetic operations
  ptrdiff_t operator-(BoundedPointer
                      const & rhs) {
    // Check validity of the pointers
    assert(m_initialised);
    assert(rhs.m_initialised);
    assert(m_p != 0);
    assert(rhs.m_p != 0);
    // Ensure both pointers point to same array
    assert(m_base == rhs.m_base);
    return m_p - rhs.m_p;
  }
  BoundedPointer & operator+=(ptrdiff_t rhs) {
    // Check validity of the pointer
    assert(m_initialised);
    assert(m_p != 0);
    m_p += rhs;
    assert(m_base <= m_p && m_p < m_base + m_size);
    return *this;
  }
  BoundedPointer & operator++() {
    // Check validity of the pointer
    assert(m_initialised);
    assert(m_p != 0);
    ++m_p;
    assert(m_p < m_base + m_size);
    return *this;
  }
// Other arithmetic operators ...
// Comparison operators
  bool operator==(BoundedPointer const & rhs) {
    // Check validity of the pointers
    assert(m_initialised);
    assert(rhs.m_initialised);
    assert(m_p != 0);
    assert(rhs.m_p != 0);
    // Make sure that both pointers point
    // to the same array
    assert(m_base == rhs.m_base);
    return m_p == rhs.m_p;
  }
// Other comparison operators ...
private:
  T * m_p;
#ifndef NDEBUG
  T * m_base;
  size_t m_size;
  bool m_initialised;
#endif
};
// Binary arithmetic operators
template <typename T>
inline BoundedPointer<T>
operator+(BoundedPointer<T> lhs, int rhs) {
  return lhs.operator+=(rhs);
}
template <typename T>
inline BoundedPointer<T> operator+(int lhs,
                                   BoundedPointer<T> rhs) {
  return rhs.operator+=(lhs);
}

Listing 3: Definition of BoundedPointer.

A BoundedPointer object can be constructed from built-in arrays and from user defined array types. The constructor for user defined array types takes two parameters (base address and size) and is intended to be called from conversion operators of those array classes. This conversion operator for BoundedArray looks like this:

template <typename T, size_t Size>
class BoundedArray {
public:
  ...
  operator BoundedPointer<T>() {
  return BoundedPointer<T>(m_data,
                           Size);
  }
};

There is also a constructor that takes a void* parameter to support assignment from NULL. A T* parameter cannot be used as it would conflict with the constructor for built-in arrays.

The BoundedPointer class supports all the operations that can be used with built-in pointers. There are checks for incrementing and decrementing the pointer to make sure that it does not point outside its array. As with BoundedInt there are checks to see that the pointer is initialised when it is used.

All methods are inlined to avoid any overhead in release builds.

Usage

The classes described here are designed to do the bounds checking during unit and system testing when compiled in debug mode. It is important to run as many test cases as possible that exercise all boundary conditions.

In release builds, all you have to do is make sure that the NDEBUG macro is defined, inlining is enabled and the optimise level is as high as possible. Then your code will be as efficient as if built-in types were used.

The BoundedIntTraits in listing 2 hides the chosen underlying integral type. If the ranges change in the future, there is no need to manually change the underlying type required for the wider range.

Extensions

This article describes the design of a class that wraps an array and adds bounds checking functionality. There are many more possible classes that can be used in this framework for different purposes. Examples include a class that manages dynamically allocated arrays.

A possible extension to the checked pointer is to keep track of whether the array still exists. If the array goes out of scope or is de-allocated the pointer shall be set to an invalid state. This is straight-forward to implement but is outside the scope of this article.

This article does not discuss checked iterators for STL containers as the article was originally intended to motivate C users to adopt C++ to improve their lives. For STL there are already implementations that check validity of the iterators.

Portability

Although the code in this article has been tested with several C++ compilers there are some difficulties using some existing compilers.

If your compiler does not support partial template specialisations you cannot use the traits class BoundedIntTraits. You can avoid the BoundedIntTraits class by removing it from the template parameter list of BoundedInt and replace it with int.

You will miss the feature where the underlying type of BoundedInt is automatically chosen from the specified range and it will be int if a type is not specified.

Conclusion

With the strategies shown in this article it is possible to catch various out of bounds conditions during the testing phase at no cost to the released code.

An additional benefit is that the bounds given to BoundedInt and the array types document their valid ranges well.

Related Reading

Safe and efficient data types in C++ by Nicolas Burrus

http://www.lrde.epita.fr/dload/20020925-Seminar/burrus0902_datatypes_report.pdf

Describes classes for compile time type safety when using different integral types. It defines safe operations for a set of integral types. The integral types used here are only bounded by the number of bits used in the internal representation. The description of operations and integral promotion is interesting and can be applied to the classes in this article.

Contains some helpful classes for determining types of integers given required number of bits. Also contains other helpful classes that can be useful in implementing a portable bounded integer and pointer library.

Boost array class in the container library

http://www.boost.org/libs/array/array.html

A constant size array class. The design goal for this class is to follow the STL principles.

Bounds checking pointers for GCC.

http://gcc.gnu.org/projects/bp/main.html

Additions to GCC to add bounds checking to the generated code.

An implementation of STL that performs various run-time checks on iterators.

CheckedInt: A Policy-Based Range-Checked Integer by Hubert Matthews

Overload issue 58, December 2003

Describes how policy classes can be used to select behaviour when a given range is exceeded.

Notes: 

More fields may be available via dynamicdata ..