Journal Articles

Overload Journal #46 - Dec 2001 + Programming Topics
Browse in : All > Journals > Overload > 46 (5)
All > Topics > Programming (877)
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: Template Metaprogramming

Author: Administrator

Date: 26 December 2001 16:46:08 +00:00 or Wed, 26 December 2001 16:46:08 +00:00

Summary: 

Body: 

make your compiler work for you

What is Template Metaprogramming? The prefix meta- is usually used to indicate an additional level of abstraction: metadata is information about data. Similarly, metaprogramming is writing programs about programs. The emerging technique of template metaprogramming takes advantage of C++ templates to write programs that manipulate themselves or other programs at compile time.

Template metaprogramming is both a curiosity and a powerful optimisation method. The curiosity is that a standard C++ compiler is a Turing-complete interpreter for a subset of C++. In theory, any computable problem can be solved at compile time without ever executing compiled code; in practice achievable results are limited by inefficiencies of the system and details of the compiler[1]. Metaprogramming can facilitate increased performance by computing performance-critical results at compile time or using compile-time logic to choose the best algorithm for the job.

I became interested in the subject of template metaprogramming after reading two recently published books: Andrei Alexandrescu's Modern C++ Design [Alexandrescu], and Czarnecki and Eisenecker's Generative Programming [Czarnecki-]. I highly recommend both these books; they will remind you of the richness of the C++ language.

The basic tools of metaprogramming consist of constructions that should be familiar to most C++ programmers: compile-time constants, typedefs, template specialization, and recursive templates. These are enough to provide both a looping construction (recursive templates) and a conditional (template specialization), which are, in turn, enough for a Turing-complete language.

The mechanics of metaprogramming may require a readjustment from the regular C++ programmer, since the C++ metaprogramming sub-language resembles a dynamically-typed functional language. However, with a good grasp of recursion, a few hints from functional programming languages, and a will to learn, the area can be quickly understood.

Metafunctions

One of the basic metaphors of metaprogramming is what I call template-is-function. Template-is-function is characterized by a template struct with an enum or typedef as a return value. Czarnecki and Eisenecker call such structs "metafunctions" [Czarnecki-]. An example metafunction to compute the square of N is:

template<int n>
struct Sqr {enum { RET = n*n }; };

And it is used like this:

std::cout << Sqr<10>::RET << std::endl;

Note that the example above and those that follow use the "enum hack" to define integer constants. This is a workaround for compilers which do not support the standard C++ feature of initialization of static constants inside the class definition, as shown below[2]:

//alternative implementation
template<int n>
struct Sqr {static const int RET = n*n; };

Several parts of metafunctions are analogous to those of ordinary functions. Here is a mapping from the concepts of ordinary functions to those of metafunctions:

function -> templated type (struct or class)
function arguments -> template arguments
function return(s) -> public enums/static constants/typedefs

There are several things to note here. First, it is often convenient to use structs rather than classes to avoid typing the public: declaration. However, if our metafunction has something to hide (intermediate implementation details), we can use a class and encapsulate those details. Second, it is possible for a metafunction to have multiple return values. I use Czarnecki and Eisenecker's convention of naming a single return value ::RET (for return). Another popular name is ::value, which is used by Andrei Alexandrescu in Modern C++ Design [Alexandrescu], as well as the boost libraries [Boost]. Regardless of the particular convention, it is often important to have consistently named returns, as we will see.

Here is another metafunction that computes N factorial:

template<int n> 
struct Factorial {enum {RET = n * Factorial<n-1>::RET };  };

By itself, this would result in infinite recursion, so we add a base case through template specialization:

template<>
struct Factorial<0> {enum {RET = 1 };  };

This example demonstrates two central elements of metaprogramming: looping using recursive templates, and conditionals using template specialization. As an exercise, you might write a metafunction to compute the nth member of the Fibonacci series.

So far this should be pretty comfortable territory. Now prepare yourself for a dark journey to the land of the angle brackets as we explore compile-time data structures.

Metaobjects[3]

To simulate collections and other objects at compile time we will take a hint from a functional language, in this case, Lisp. The basic data structure in Lisp is called a cons, which is more or less the equivalent of std::pair. A cons has two members, car, which is the head, and cdr, which is the tail[4]. I have followed Czarnecki and Eisenecker by adopting the more descriptive Head and Tail. We can form lists, trees, and other useful data structures by creating conses which themselves contain conses. We terminate these data structures with a special empty value, Nil. Our metaprogramming list basics are:

struct Nil {
  typedef Nil Head;
  typedef Nil Tail;
};
template<class Head_, class Tail_=Nil>
struct Cons {
  typedef Head_ Head;
  typedef Tail_ Tail;
};

To enable us to hold integers in conses, we'll define:

template <int n> 
struct Int {enum {RET = n }; };

Now we can represent the list 3, 1, 2, 4 as:

Cons<Int<3>, Cons<Int<1>, Cons<Int<2>, Cons<Int<4>, Nil> > > >

See what I mean about angle brackets?

Simulating cons in C++ metaprogramming demonstrates the metaphor I call "template-is-object." Multiple typedefs or enums in a struct work like the members of a C++ object. Here is the mapping from the concepts of ordinary objects to those of metaobjects:

object -> templated type (struct or class)
constructor arguments -> template arguments
methods/member variables -> enums/static constants/typedefs

The template-is-object metaphor works somewhat less well than template-is-function because in metaprogramming there is no way to modify member variables. Rather than true variables, functional programming has symbolic names for values. If a new value is needed, a new name must be introduced [Czarnecki-].

There are many interesting metafunctions that could be written to operate on lists made up of conses, e.g.,

//metafunction to compute the length of a list made up of conses
template<class List>
struct Length {enum {RET = 1 + Length<typename List::Tail>::RET}; };
template<>
struct Length<Nil> {enum {RET = 0 }; };
//metafunction to append an object to the end of a list
template<class Elt, class List>
struct Append1 { 
typedef Cons<typename List::Head, Append1<Elt, typename List::Tail>::RET> RET;
};
template<class Elt> struct Append1<Elt, Nil> { typedef Elt RET; }

A quick glance through an introductory Lisp book shows the functions member, nth, and union as interesting candidates for metafunction-writing exercises [Graham]. If nothing else, your excursion into metaprogramming will teach you how to use the typename keyword[5].

How Many Queens are Too Many?

Knowing that the best way to learn is by doing, I set out to solve an example problem with metaprogramming, N queens. For those who are not familiar with the problem, the task is to find a placement of N queens on an N x N chessboard so that none of them is attacked by the other queens. This is a generalization of the 8-queens problem, which uses a standard chessboard.

The first task was to develop an ordinary solution to the problem. After some tinkering, I came up with:

//represent a queen by its row and column
struct Queen {
  Queen(int r, int c) : row(r), col(c) { }
  int row;
  int col;
};

//determine whether newQueen would attack any of queens check for horizontal
// (rowdiff==0), vertical (coldiff==0), and diagonal attacks (rowdiff==coldiff)
bool hasConflicts(Queen newQueen, std::list<Queen> queens){
  for (std::list<Queen>::iterator ii=queens.begin(); ii!=queens.end(); ++ii) {
    int rowdiff = abs(ii-row - newQueen.row);
    int coldiff = abs(ii-col - newQueen.col);
    if (rowdiff==0 || coldiff==0 || rowdiff==coldiff)
      return true;
  }
  return false;
}

//display a solution (a list of queens) as pairs of coordinates separated by semicolons
void printSolution(std::list<Queen> queens){
  for(std::list<Queen>::iterator ii=queens.begin(); ii!=queens.end(); ++ii) {
    std::cout << "(" << ii-row << "," << ii-col << ");";
  }
  std::cout << std::endl;
}

//place a Queen on the "board" by adding to the list queens
//Recursively places the next queen and prints a solution if it exists 
void placeQueen(int N, int row, int col, std::list<Queen> queens) {
//try the next column for additional solutions
  if (col  N-1)
    placeQueen(N, row, col+1, queens);
//if there is no conflict, attempt another queen in the next row
  if (!hasConflicts(Queen(row, col), queens)) {
    queens.push_back(Queen(row, col));
//if we have not yet reached a full solution, place another
//queen; else, print the solution
    if (row  N-1)
      placeQueen(N, row+1, 0, queens);
    else
      printSolution(queens);
  }
}

//reads the problem size from the command line and prints any solutions
int main(int argc, char* argv[]) {
  int N = 8;
  if (argc > 1)
    N = atoi(argv[1]);
  placeQueen(N, 0, 0, std::list<Queen>());
}

This prints the solutions in a not-so-user-friendly format to stdout. A good test is the number of solutions (for 8 queens it should be 92). We can check by piping the output into the Unix "wc" utility, which counts lines when given the -l option.

jwalker@avakkai $a.out 8 | wc -l
  92

Looks good. Now to translate it to a metaprogram. The Queen struct becomes:

template<int r, int c>
struct Queen {
  enum { row = r };
  enum { col = c };
};

To implement HasConflicts, we will need an abs equivalent:

template<int> n
struct Abs {enum { RET = n < 0 ? -n : n }; };

We can replace the for loop in hasConflicts with recursion:

template<class Queen, class QueenList>
class HasConflicts {
  typedef typename QueenList::Head Head;
  enum { coldiff = Abs<Queen::col - Head::col>::RET };
  enum { rowdiff = Abs<Queen::row - Head::row>::RET };
public:
  enum {RET = coldiff==0 || rowdiff==0 || coldiff == rowdiff ?
    true : HasConflicts<Queen, typename QueenList::Tail>::RET
  };
};
template<class Queen>
class HasConflicts<Queen, Nil> {
public:
  enum { RET = false };
};

Notice that the implementation details are hidden by making them private.

A compile-time if statement will make implementing placeQueen much easier. Czarnecki and Eisenecker provide this implementation using partial specialization (as well as another which does not require partial specialization) [Czarnecki-]:

template<bool condition, class Then, class Else>
struct IF {typedef Then RET;};
template<class Then, class Else>
struct IF<false, Then, Else> {typedef Else RET; };

As Czarnecki and Eisenecker explain, when using this compile-time if statement, it is important to move as much evaluation outside the template arguments as possible. Otherwise, infinite template recursion could result. That is, instead of:

typedef typename IF<n==0, F<n>::RET, G<n> >::RET::RET RET;

The evaluation of ::RET in the Then or Else parts should be delayed:

typedef typename IF<n==0, F<n>, G<n> >::RET::RET RET;

This explains the need for a consistent naming scheme for metafunction returns. If F<> and G<> used different names for their return values, it would not be possible to move the evaluation outside IF<>. Of course, there will be occasions when it is inconvenient to have the same name for return values. Consider a case where we have received a list in the template parameter OldList and wish to conditionally append a value to it:

//won't compile
typedef typename IF<n==0, OldList, Append1<Int<n>, OldList >::RET::RET RET;

Here the problem is that there is no typedef RET in OldList (which is just a cons). To solve this problem, we can use an extra level of indirection with a "template adapter." In the case of cons, this should simply make the structure available as the typedef RET. We will name the metafunction Identity, after the Lisp function which performs the same operation.

template<class Obj> struct Identity {typedef Obj RET; };

Now the example above could correctly be written as:

typedef typename IF<n==0, Identity<OldList>, Append1<Int<n>, OldList >::RET::RET RET;

Now let's get back to N Queens. Rather than immediately printing the solutions, we will accumulate them in a list which is passed in as a template argument of PlaceQueen<> and out as the typedef RET:

// translation of dynamic placeQueen function. SolList is passed the list of existing
// solutions since we cannot print them immediately when found
template<int N, int col=0, int row=0, class QueenList=Nil,
class SolList=Nil>
class PlaceQueen {
  typedef Cons<Queen<row, col>, QueenList> NewQueenList;
//we add any solutions found by placing in the next column as the temporary TempSol
  typedef typename IF<col < N - 1,
                        PlaceQueen<N, col+1, row, QueenList, SolList>,
                        Identity<SolList> >::RET::RET TempSol;
public:
  typedef typename IF< HasConflicts<Queen<row, col>, QueenList>::RET,
                   Identity<Identity<TempSol> >,
                     IF<row < N - 1, PlaceQueen<N, 0, row+1, NewQueenList, TempSol>,
                      Identity<Cons<NewQueenList, TempSol> >
                 >
  >::RET::RET::RET RET;
};

This is enough to test our algorithm. We can find out the result of the compiler's computation by asking for a nonexistent typedef, e.g., foo, in the result. This will force the compiler (gcc, at least), to output an error message containing the type name:

const int NUMQUEENS=4;
int main() {
  PlaceQueen<NUMQUEENS>::RET::foo bar;
}

Compiling, we get:

jwalker@avakkai $ g++ NQueensStatic.cpp -ftemplate-depth-100 -w
NQueensStatic.cpp: In function 'int main()':
NQueensStatic.cpp:85: 'foo' is not a member of type
 Cons<Cons<Queen<3,2>,Cons<Queen<2,0>,Cons<Queen<1,3>,Cons<Queen<0,1>,Nil> > > >,
Cons<Cons<Queen<3,1>,Cons<Queen<2,3>,Cons<Queen<1,0>,Cons<Queen<0,2>,Nil> > > >,Nil> > '
NQueensStatic.cpp:85: parse error before ';'

The error message contains our solutions!

This is interesting, but not incredibly useful. What we would like to have is a way to use the information from compile-time computations at runtime. We can now use what Czarnecki and Eisenecker call "execute-loops" to statically generate code. This Foreach loop was inspired by the EWHILE, EDO, and EFOR examples in Generative Programming [Czarnecki-].

//compare two types for equality
template<class lhs, class rhs> struct EqualType {enum {RET = false }; };
template<class lhs> struct EqualTypelhs, lhs {enum { RET = true };  };
struct EmptyStatement { static void exec() { } };
//loop over a cons-based list, executing Statement::exec() for each list member
template<class Statement>
struct Foreach {
  static void exec() {
    Statement::exec();
    IF<EqualType<typename Statement::NextList, Nil>::RET,
      EmptyStatement,
      Foreach<typename Statement::Next>
    >::RET::exec();
  }
};

We combine this code with a custom statement to loop over each solution, and then over each queen within the solution in turn.

//statement for Foreach which prints a list of queens in 
// the same format as the dynamic solution
  template<class QueenList>
  struct PrintQueens   {
    typedef typename QueenList::Head Queen;
    static void exec() {
      std::cout << "(" << Queen::row << "," << Queen::col << ");";
    }
    typedef typename QueenList::Tail NextList;
    typedef PrintQueens<NextList> Next;
  };
//statement for Foreach which prints a list of solutions one per line
  template<class SolutionList>
  struct PrintSolutions {
    typedef typename SolutionList::Head Solution;
    static void exec() {
      Foreach<PrintQueens<Solution> >::exec();
      std::cout << std::endl;
    }
    typedef typename SolutionList::Tail NextList;
    typedef PrintSolutions<NextList> Next;
  };
  const int NUMQUEENS=4;
  int main() {
    typedef PlaceQueen<NUMQUEENS>::RET Solutions;
    Foreach<PrintSolutions<Solutions> >::exec();
  }

Let's try it:

jwalker@avakkai $ g++ NQueensStatic.cpp -ftemplate-depth-100 -w
jwalker@avakkai $ a.out
(3,2);(2,0);(1,3);(0,1);
(3,1);(2,3);(1,0);(0,2);

That's what we wanted. We have computed the solution at compile-time and printed it at runtime. A similar technique could be used to convert the compile-time solution into a run-time data structure for further analysis.

Now for my dirty secret. As you may have guessed based on the cautious NUMQUEENS=4 statement above, compile-time computation is not terribly efficient. Here is a comparison of compile and execution times for the static and dynamic solutions, varying the number of queens from 4 to 8 (tests were run using gcc 2.95.3 on a Sun Enterprise 420R with quad 450Mhz processors and 2GB of RAM).3

<colgroup> <col> <col> <col> <col> <col> <col></colgroup> <tbody> </tbody>
4 5 6 7 8
Static Metaprogram
compile time 1.067s 4.655s 31.76s 696.8s N/A
compiler peak VM usage 6.6MB 22.2MB 58.2MB 982MB N/A
execution time 0.009s 0.008s 0.009s 0.010s N/A
Dynamic Program
compile time 1.019s 1.019s 1.019s 1.019s 1.019s
execution time 0.009s 0.012s 0.020s 0.059s 0.246s

In fact, I cannot even compile the 8-queens solution because the machine runs out of virtual memory after exhausting all 2GB6! While it is somewhat depressing that this canonical instance of N Queens cannot be solved, I am quite satisfied by the wealth of knowledge I learned constructing this example. As a similar exercise, it might be fun to solve the Towers of Hanoi with static metaprogramming.

Luckily, there are many useful techniques that can be performed without taxing the compiler so heavily as this toy example. Real-world uses of metaprogramming include customisation using template traits (see Andrei Alexandrescu's Loki library [Loki] for excellent examples of traits) and high-performance numeric computing with expression templates (for which Blitz++ [Blitz] is gaining fame).

References

[Alexandrescu] Andrei Alexandrescu: Modern C++ Design: Generic Programming and Design Patterns Applied, Addison-Wesley, 2001, ISBN 0-201-70431-5

[Czarnecki-] Krzysztof Czarnecki and Ulrich W. Eisenecker: Generative Programming: Methods, Tools, and Applications, Addison-Wesley, 2000, ISBN 0-201-30977-7

[Boost] "Boost C++ Libraries", http://www.boost.org

[Graham] Paul Graham: ANSI Common Lisp, Prentice Hall, 1996, ISBN 0-13-370875-6

[Williams] Anthony Williams: "Introduction to C++ Templates", Overload 45, October 2001

[Blitz] "Blitz++ Home Page", http://www.oonumerics.org/blitz/



[1] Although in practice, the implementation-dependent maximum depth for template instantiations means that this Turing machine may have a very short tape. The C++ Standard only requires a maximum depth of 17; anything else is an extension. I used gcc 2.95.3 to compile the examples in this article. Gcc has a -ftemplate-depth-xxx switch allowing the user to explicitly set the maximum depth for template instantiations.

[2] Strictly speaking, the C++ standard requires that a definition be provided for a static constant member if it is used in the program. However, the intent (and probable result using existing compilers) seems to be that the definition should only be required if the static constant is used in a way requiring storage in memory, e.g., taking the address. Additionally, curiously enough, the enum version compiles over twice as fast as the static const int version on gcc 2.95.3.

[3] Metaobjects is probably a misuse of the meta prefix. More accurately, these are metaprogramming objects.

[4] "The names car and cdr derive from the internal representation of lists in the first Lisp implementation: car stood for 'contents of the address part of the register' and cdr stood for 'contents of the decrement part of the register'" [Graham].

[5] The "typename" keyword must be used to qualify Dependent Names. Anthony Williams has written an informative article on C++ templates which unfortunately arrived in my mailbox too late to help me with these details [Williams].

Notes: 

More fields may be available via dynamicdata ..