ACCU Home page ACCU Conference Page
Search Contact us ACCU at Flickr ACCU at GitHib ACCU at Facebook ACCU at Linked-in ACCU at Twitter Skip Navigation

pinDo birds in their little nests always agree?

Overload Journal #4 - Feb 1994 + Programming Topics   Author: Kevlin Henney

A complaint often raised by Pascal programmers moving to C or C++ is the lack of nested functions in these languages. When pushed there appears to be no need for them, just a want. Francis rattled the cage of this topic in the last copy of Overload [see Uses of Classes with only Private and Protected Constructors].

Functions in C exist at file scope, and in C++ they may also exist in class scope. Should we appease Pascal programmers moving to C and C++ by allowing functions to be defined in local scope, or should we encourage them to write C and C++?

There are many interpretations of what function nesting means and possibly not enough consideration of the consequences. I propose to take a look at the issues involved in adding another definition scope for functions, and what this means for programming styles and program extensibility.

The Algol approach

Why would you expect to nest a function inside another function? A number of ALGOL derived languages, notably Pascal, offer nested procedures and functions that can operate on local variables. In C and C++ this would look something like

void DoSomething() {
double X, Y;
double Radius() {
return sqrt(X * X + Y * Y) ;
}
double Theta() {
return atan2(Y, X);
}
}

Convenient as this facility may look, the function definition is now longer and therefore less readable than using functions defined at file level — a situation that is often far worse in Pascal because of inconsistent indentation practices. In this example two otherwise general purpose functions have restricted availability in exchange for empty argument lists. A couple of important questions to consider:

  1. What happens when a pointer to a local function is returned
  2. What about recursive local functions?

While C has never been a bondage and discipline language, imposing a specific view of good practice on its users, there is the unfortunate possibility of returning a pointer to a function that uses auto variables that will have fallen out of scope:

void (*ReturnFunc()) ()
{
int LocalVar;
void NestedFunc()
{
++LocalVar;
}
return NestedFunc;
}

This is clearly unsafe, but so is returning a pointer to an auto variable and that is not illegal in C. The next question raised is a little more interesting. Consider the following function:

void DoSomething()
{
int Array[ArraySize];
int Sum(size_t Arg)
{
return Arg-- > 0
?  Array[Arg] + Sum(Arg) : 0;
};
}

References to auto variables outside the current block may be implemented either by using a fixed offset to reach them down the stack, or the largest possible activation record for a function is placed on the stack and the variables are accessed as if they were local to the current block. This only works for truly nested blocks that do not interact with one another except their parents. The calling environment for Sum changes, and hence the offset to DoSomething's local data is not constant, yet it must still access the same copy of Array. There are a number of methods for implementing this slightly more complex arrangement, one of which is to pass the nested function a pointer to the appropriate activation record on the stack to access the local data. The eagle eyed OOPers among you may recognise the approach: the this pointer. But pointers to local functions implemented like this would present a huge hole in the type system: their prototypes are now incorrect as they take a silent extra argument.

Other approaches based around a secondary stack of thunks could be used. To ensure correct deallocation the thunks would have to be bound to the lifetime of the function. This would restrict the meaningful use of nested function pointers to the lifetime of the defining function.

Storage class

The scope of a nested function is well defined, but what about its lifetime? A nested function, as defined above, refers to stack data and cannot be used meaningfully outside its scope, which implies that the storage class of a nested function is  auto.  However,  they clearly have
internal linkage, which is only possible for static entities. Clearly a few rules are going to have to change to accommodate nested functions.

Given this, I think it is safe to say that ALGOL style nested procedures have no place in either C or C++. A simpler approach is to acknowledge that nested function definitions have static storage class and may only refer to other non-auto names in scope. For most uses nested functions would simply be another scoping mechanism, but at least function pointers no longer present a problem.

There is another issue that is relevant regardless of storage class: the goto. I am no fan of the goto, but it is there and its demands on the language must be met. C has three basic kinds of scope: local, function and file. C++ adds class scope to this list. Only labels have function scope — unless you wish to introduce Pascal-like label declarations (no thanks). Does this mean that non-local goto's would be possible with nested functions in C and C++?

void PathologicalLoop() 
{
void Iterate()
{
InnerLabel:
goto OuterLabel;
} OuterLabel:
goto InnerLabel;
}

A particularly unsavoury example — especially the jump to the inner loop. Undoubtedly an extension of the rules defining goto destinations and what function scope really means are required, with the likely prohibition of jumps into a nested function. With all due respect to jumps out of functions, C++ already has an exception handling mechanism that is considerably more powerful and certainly more elegant.

Use and reuse

We have not yet asked why we would want to have nested functions. All the objections have focused on the technical consequences of introducing them into C and C++. The astute may have noticed nested functions operating on local data bears a striking similarity to certain capabilities offered in C++ but not in C. For nested functions that may only operate on static data this is a much less useful feature. In fact, unless you can contrive an example of a non-reentrant function that does not behave like an object, this feature borders on being useless.

This is part of the problem with nested functions: in languages that support them people often use nested functions where they actually require a class instance that models a state machine. There are already a number of functions masquerading as objects, and one of the most obvious examples is strtok. The use of nested functions in an OOP language is thus suspect. Even the GNU compiler, which takes some extraordinary liberties in extending both C and C++, only permits nested functions in C, recognising either their irrelevance or their extra complexity in C++.

What about the role of a function as a name space container for other functions? Granted that you may not wish to broadcast a function's existence to every entity in name space, but if we are talking about methods declare them private, otherwise they should be declared as static within a file. Not only are functions not data objects, they are not packages either. In the Modula sense of the word, functions make quite poor modules. Without an export interface they are totally closed, and thus impervious to later design decisions requiring a change in the interface.

Reuse is implicitly discouraged by nesting — abstractions that are worth making once are probably worth using again. I am always surprised when I see structs, enums and typedefs that are defined in local scope. These mechanisms hide details that would otherwise reduce portability or encourage hard coding. They abstract the interface to functions and the communication between program units. It is almost inevitable, according to Murphy's Law, that something hidden in this way will be required at a later date in another function within the same file. This may happen if the functionality of the original module is extended and so more functions require access to the type abstractions, or if the original function gets too long and is broken up. Nested functions would exacerbate rather than relieve this problem.

For data types this attitude has grown, I believe, from the top-down view that structures are a simple repository for data and functions are the real stuff of programs. 00 turns this on its head, and it is only because it was deemed that the only difference between structs and classes would be in default member access that local classes in C++ are a possibility. Even these are not exactly encouraged in C++ where everything must be defined in the declaration. James Coplien, in his excellent book Advanced C++ Programming Styles and Idioms, relegates local classes to an appendix where he uses them to emulate ALGOL style block structured programming. This is perhaps the least useful and least convincing section of the book. As an aside, the ISO committee responsible for looking at OO extensions to Pascal has simply disallowed local classes.

The pollution of name space is certainly an issue that should not be ignored, and where a function must absolutely and definitely be used by only one function classes should be used to handle this role for the moment — they make much better modules than functions. It is also significantly easier to make them extensible. Consider the following three solutions to this problem:

  1. void  PublicFunc()
    {
    void PrivateFunc()
    {
    }
    }
  2. class Scope
    {
    friend void PublicFunc();
    static void PrivateFunc();
    };
  3. class Scope
    {
    public:
    static void PublicFunc();
    private:
    static void PrivateFunc();
    };

PrivateFunc is declared for use only by PublicFunc. What if requirements change and one other function requires the use of PrivateFunc? The first example requires hacking out PrivateFunc and making it static within a file in the best traditions of cut and paste programming; the second adds another prototype to the friend list, utilising a kind of selected client export idiom; and the third only requires another public declaration, having recognised the coupling issue by grouping the functions together in the same name space. Which you prefer is clearly a matter of taste, but my own preference is not for number 1.

Conclusion

The only language that I have ever used where procedural nesting made sense was in occam where procedural units could either be executed in sequence as normal procedures or in parallel as processes. Here the idea of contained functionality made a lot more sense. Even with processes the generality and reusability problem reared its ugly head. The problems created by jumps, recursion and pointers were easily solved: these features do not exist in occam.

C++ already has the ability to make functor objects that act like functions, but these are potentially more versatile and flexible inasmuch as they may be created for a purpose at runtime. This is a style of programming explored more fully in Coplien's book.

Nesting of types within a class is a different case to function nesting, and not just because of declaration versus definition. This meets the criteria for communication and abstraction that I mentioned earlier on. Either the types are used by clients of the class when interacting with its instances, or the types are used between the private methods of the class. Readability and implementability are still important and anything other than trivial nested classes should be hived off into a separate friend class so that the original type is now represented by a class cluster [see Uses of Classes with only Private and Protected Constructors by Francis in the last issue].

Exceptions to rules have a nasty habit of creeping into language definitions and taking root. A great many would have to be introduced to both C and C++ to make any sense of nested functions. This might keep some language lawyers happy, but I believe the case against them is watertight.

Overload Journal #4 - Feb 1994 + Programming Topics