Journal Articles

Overload Journal #42 - Apr 2001 + Programming Topics
Browse in : All > Journals > Overload > 42 (9)
All > Topics > Programming (877)
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: C++ Idioms: First Thoughts

Author: Administrator

Date: 26 April 2001 17:46:05 +01:00 or Thu, 26 April 2001 17:46:05 +01:00

Summary: 

Body: 

One of the major problems with developing C++ skills is that there is very little material available for those that have just completed their introduction to the language. There are numerous books that purport to introduce you to C++ and a very few do a good job of it. There are quite a few books that will help develop the skills of those that have thoroughly mastered the language. There are also a handful of books that highlight particular details. However I do not know of any that focus on the modern idioms of the language. My intention in this article is to cover a number of things that I think are useful and to invite readers to both critique my ideas and to add material of their own.

Experts among the readership will probably be familiar with many of the items in this article, and may have ideas as to how they may be better developed. If you are among those, please at least skim this article because I will start with some very simple things but I will, hopefully, progress to more complicated ones.

Largely Lexical Idioms

The ideas I will present in this section are almost entirely to do with how you present your code. I do not mean such trivia as white-space conventions and where you place your braces; I hope the following will be a little more significant.

1) Prefer 'Need to Know Ordering'

When you have options as to how you write code, think carefully about what the user needs to know. For example, the most important thing for most users of a class is the public interface. Therefore make sure that it is easily and quickly visible on the screen. Organise class definitions so that the public interface comes first and the private interface last. If for no other reason, question in-class definitions of member functions. If you decide that a member function should be inline, define it after the class definition. Actually, seriously consider placing inline definitions in their own file that can be included into the header file when the code is stable enough to pay the price of forcing such a dependency on the user of the class.

The principle of 'need to know ordering' should permeate your coding style. Do not dump information onto your fellow programmer. Some time ago Jon Jagger suggested that file scope comments should be organised on this basis. When a user looks at a header file they are most likely to want to know about the class definitions and function declarations in the file, and are much less likely to want to know the development history. In general, file scope comments should be later rather than earlier in a file. By all means start with a comment that claims copyright and gives a release date but that would normally be sufficient.

2) Be Consistent

When presented with choices that only apply sometimes, be consistent.

a) Placement of const

const normally qualifies the type directly to its left. There is a special case when there is no type to the left, in which case it qualifies the type name directly to its right. Note that I carefully chose 'type name' in that last sentence; if you use a typedef then a leading const is equivalent to a const immediately after the typedef name. The following should clarify that:

typedef int * int_ptr;
const int_ptr * ipp; 

that last declaration is equivalent to:

int_ptr const * ipp;

which in turn is equivalent to:

int * const * ipp;

Note that that is not equivalent to 'const int * * ipp;'

b) struct or class

We know that either keyword will do in the context of declaring/defining a user-defined type. Furthermore, in view of my earlier advice (re need-to-know ordering), you will write exactly (word for word) the same definition regardless of which choice you make. My advice would be to consistently use 'class' unless you are electing to have public data. In that case, always use 'struct.' By the way, never have protected data. Either data should be strictly part of the implementation and inaccessible outside the context of that class or it should be public because you have carefully considered the issue and determined that public data is appropriate. A common case of this is when you provide a private nested type. In such circumstances it makes little sense to grant friendship to the enclosing class when making the data public has the same effect.

While I am on this subject, note that strictly according to the current standard it is not possible for an enclosing class to grant friendship to a nested class (the most likely, and highly popular among the Standards Committee, fix to this problem is to automatically grant access to a nested scope/class. My next suggestion is in that context). If you want to keep two blocks of data and functions apart from each other, place each in its own nested class, grant friendship to the enclosing class if appropriate to your design.

c) class and typename

In the context of template type parameters these two keywords are equivalent. That does not mean that we should use them that way. Some template type parameters can be built-in types (for example, the type of the object that a vector contains), other template type parameters must be a user defined type (for example, the STL priority_queue adaptor's second parameter). Either consistently use just one of the keywords everywhere, or use typename when built-in types are possibilities but use class when they are not. Do not randomly choose according to your current mood.

d) Writing a for-loop

Strictly speaking this one is not a lexical convention because the alternatives actually have different semantics. What I am thinking of is how you should write a for-statement. Those of us who first learnt C (particularly those with the wisdom to have read and digested 'C Traps and Pitfalls' by Andy Koenig) will be used to this idiom:

for (iter=0; iter<max; iter++) { ... }

That makes perfectly good sense in C where iter is almost certainly some type of integer. It still works fine in C++ when iter is an integer type. However, in C++, iter is highly likely to be some kind of user-defined iterator. This leads to two changes.

We replace the '<' with !=. That change is essential because there is no guarantee that incrementing iterators will result in a value that is greater than the previous one. Puzzled? Think about how lists work. As we iterate through a list we get the address of the next node, but there is no reason to suppose that this is physically after the previous one, indeed, by the time we have sorted out list, it probably isn't.

The second thing we do, for efficiency rather than correctness this time, is to get a habit of using pre-increment/decrement whenever we can. The pre versions are always at least as efficient as the post versions but in the case of user defined types they are usually much faster (indeed, originally C++ did not support user-defined post increment/decrement).

So an experienced C++ programmer instinctively writes:

for (iter=0; iter!=max; ++iter) { ... }

Actually, the most common form for iteration looks like:

for (iterator iter=x.begin(); iter!=x.end(); ++iter) { ... }

and remarkably often that is rewritten as:

for_each(x.begin(), x.end(), dosomething)

which focuses the attention on the fact that you do the same thing to every member of a container. Breaking old habits and using newer idioms often makes your code clearer and therefore needing fewer comments. Fewer comments means that the ones you write are more likely to be read.

e) In General

I wonder how many other cases you can come up with. Send them to me and I will collate them for publication. However, in general, given choice, have a consistent policy.

3) Choose Names in Context

Identifiers are often difficult to choose. To my mind it is important to avoid duplication of information. For example, in the context of a Date class, is it necessary to call a member function getmonth or printdate rather than month or print? The scope in which an identifier is declared is important, if you do not use that information when choosing a name then you will be providing over-long names that will make your code harder to follow.

4) Understand Lexical Conventions

Using only uppercase letters in an identifier is sometimes called 'shouting.' Just like actual shouting it adds nothing to the process of understanding (indeed shouting often makes it harder to understand) but it is supposed to stand out as a warning. We shout to demand attention, we write in all uppercase to demand attention. There is no need to do this just because something is a constant (enumeration value or const qualified value). However there is every reason to do so when we want to warn the reader that what they see may not be what they get. In this context it makes perfect sense to use all uppercase (or more accurately 'no lowercase') identifiers for the pre-processor exactly because they will be substituted by something else before the tokeniser sees your code.

Remember that some idioms are just an unspoken agreement to do things the same way; the advantage is purely that and nothing more. In other cases idioms serve a deeper purpose, one that is intended to keep us out of trouble. One of the things that profoundly annoy me about so many books and courses for novices is that the authors/presenters do not understand this. Most idioms are not just issues of personal taste; they keep you from trouble and make your code more accessible to those reading it.

Mainly Semantic

Now let me explore some of those things that have a deeper impact on your code. There are two major tools available that every competent programmer should be familiar with; delegation and proxying. One of the major advantages of C++ is that it supports both these things very well.

1) Forwarding Functions

I remain profoundly puzzled as to why these are not better known and more widely used. I hope that by the time you have finished reading this you will be asking that C++ completes its support by allowing a constructor to forward to another constructor for the same type.

a) Delegation v Default Arguments

There are two things about default arguments that make them unattractive. The first is that you can only default from the right hand end of the parameter list (but be careful because declarations including defaults can be combined, I will say no more about this because I think that doing any such thing is nauseating). Surely a method that allows you to provide a default for any combination of arguments would be preferable? The second issue is that declarations of the same function in different translation units can have different default arguments. For example the following is perfectly legal:

File1.cpp
void foo (int val = 0) { cout << val; }
File2.cpp
void foo (int val = 1);

To see just how bad this might be consider:

File3.h
extern inline void bar() { foo(); }

Now include that header file after the declaration of foo in each of File1.cpp and File2.cpp. You now have a breach of the one definition rule. Yes, I realise that this code is simplistic but my purpose is to provide a minimalist example of the potential for error. Oh, and the reason for that extern qualification is to make it clear that bar() has external linkage as opposed to the older convention that inline functions had internal linkage.

Now let me return to the theme of eliminating default arguments from your code. Every time you use a default argument you could use delegation via a forwarding function instead. So I can replace:

void foo (int val = 0) { cout << val; }

with

void foo (int val ) { cout << val; }
inline void foo() { return foo(0); }

Indeed the syntax of the language was actually modified to allow such functions to work even when the return type is void. That change was mainly to support forwarding functions in the case of a template, where the return type might be a template type parameter, but it works in the general case.

Note that this method of replacing defaults is more powerful than what it replaces. We can use a forwarding function with the same name to provide a 'default value for any combination of parameters as long as the resulting list of parameter types is unique. For example:

double foo2(int val, double d, std::string message);
double foo2 () 
  {return foo2(0, 0.0, "trivial"); }
double foo2 (int val)
  {return foo2(val, 0.0, "int given"); }
double foo2 (double d)
  {return foo2(0, d, "double given"); }
double foo2 (std::string message)
  {return foo2(0, 0.0, message); }
double foo2 (int val, double d)
  {return foo2(val, d, "no message"); }
double foo2 (int val, std::string mess)
  {return foo2(val, 0.0, mess); }
double foo2(double d, std::string mess)
  {return foo2(0, d, message); }

Even without use of the inline keyword most modern compilers will inline such simple forwarding functions as a standard optimisation.

Are there any other advantages? Well yes, if the need arises you can take the address of any or all of the forwarding functions, of course when you do so the compiler will have to provide an instantiation and forgo the luxury of just using it inline (though that does not prevent it from doing so when it can).

Another advantage is when we want to compute a 'default' parameter from arguments that have been provided. The calculation can be done in the body of the forwarding function, thus avoiding any risk that we will enter the realms of undefined behaviour because the arguments for a function call can be evaluated in any order. For example:

double foo3(int val, int val2 = val*2);

might seem reasonable until you realise that there is no requirement for arguments to be evaluated in left to right sequence, though it is hard to see how a compiler would mess that one up. However we do not even need to consider the problem if we use a forwarding function because we can fully determine the order of evaluation:

double foo3(int, int);
double foo3(int val)
  { return foo3(val, val*2); }

The crux of this is that forwarding functions can do everything that default arguments can achieve and more. So why use a lesser technique when a more general one is hardly more difficult - yes, you do have to do a little more typing.

There is an exception to this and that is default arguments for constructors; there is no way of forwarding from one constructor to another. I find that less than convenient (there really is no reason for constructors to be a special case. Therefore I am gathering opinions on a proposal for an extension to C++ to provide syntax for forwarding from one constructor to another. In simple terms I am proposing that where there is a sole entry in the constructor-initialiser list for a type that is itself a constructor for that same type, then the construction is delegated to that constructor.

Example: (NOT VALID currently)

class mytype {
  int val;
  long int anval;
  double dval;
public:
  mytype(int, long, double);
  mytype():mytype(0, 1L, 0.0) { /* the body of this ctor is run immediately on return from the delegated ctor */ }
  // etc.
};

This is only one of a number of extensions that I would like to propose for the next C++ standard. If you are interested, I would also like to see consideration of const constructors and a method for inheriting both the implementation and the interface of a base class without allowing the implicit conversion from derived to base.

2) Issues of const correctness

There are a number of places where we have to give careful consideration to the impact of const qualification on our code. There are several idioms that help with this; once you know them and apply them automatically you will find that your clients have fewer surprises.

a) Overloading to retain correct const qualification

A functions that takes an input parameter, does something with it that does not mutate it and then returns it are particularly vulnerable to adding unwanted const qualification. There are several such functions in the Standard C Library (which you remember is also part of the Standard C++ Library), but rather than pick on one let me give you a general function of that form.

Consider a function that takes a null terminated array of char as input, and determines where the next word begins. It returns a pointer to that position. So its declaration might be:

char * nextword( char * );

The problem with this is that you cannot use it on an array of const char. So your next shot is:

char * nextword( char const * );

Unfortunately that does not work, because the returned pointer is pointing into the array that was passed as const qualified. Now you must try:

char const * nextword( char const * );

That works fine until the day dawns when you want to modify the returned sub-array. For example:

void capitalise(char * data){
  while (*data) {
    data[0] = toupper(data[0]);
    data = nextword(data);  // ERROR
  }
}

You see the problem? nextword() has added a const qualifier and so its return value cannot be held in a plain char* variable. In this particular case we can safely use a const_cast to solve the problem (well we could apart from that dratted deprecated conversion from string literal to char*, but that is another problem that I will deal with in a moment):

void capitalise(char * data){
  while (*data) {
    data[0] = toupper(data[0]);
    data = const_cast<char *>nextword(data);  // OK
  }
}

But we really do not want to force this kind of code on our clients when we can do it for free for them. A simple forwarding function solves this problem at compile time with no runtime overhead. We just add the following function:

inline char * nextword( char * arr){
  return const_cast<char *> nextword(static_cast<char const *> arr);
}

The static_cast forces the selection of the const version of nextword(), the const_cast strips off the const qualification, and can do so safely because that qualification was just an artefact of calling nextword(). There is no runtime overhead because the compiler needs to do nothing, our code just prevents the accidental acquisition of a const qualification.

While you do not need to write this kind of code often, you should be familiar with how to do it when necessary.

b) Handling string literals

For many years C++ suffered from the problem that string literals were syntactically arrays of char, but semantically arrays of const char. There was a great deal of code that included lines such as:

char * mess = "string literal";

Very late in the day, a paper written by Kevlin Henney finally changed the type of a string literal to array of const char. However it was necessary to minimise damage to existing code. For that reason the C++ Standard provides a special conversion from string literal to char*. It is only available when it is necessary to convert an actual string literal to a char*. For example:

char * mess = "string literal";  // OK
char const * mess1 = "another string literal";
char * mess = mess1; // ERROR, mess1 is not a string literal

Now there is nothing we can do to prevent:

capitalise(mess);

because mess is of type char*.

capitalise(mess1);

will fail because capitalise requires a plain char*. Unfortunately:

capitalise("this is a problem")

compiles even though the result is undefined behaviour. There is a simple fix to this problem, just declare an appropriate overload of capitalise:

void capitalise(char const *);

Be careful that you do not define it. Now string literals will try to use this overload and the linker will generate an error. The failure is later than ideal, but earlier than a segment fault at runtime.

Whenever you write a function that takes a parameter of type char*, consider overloading it for parameters of type char const *, but do not define those overloads.

c) Overloading const and non-const member functions

There are two problems here. The first one is that the compiler selects on the basis of the qualification of the object using the function. This often causes surprises but I am going to leave the exploration of this till another time.

The second problem is when you write something such as:

class problem {
public:
  void foo(int);
  void foo(char) const;
// rest of class
};
int main(){
  problem p;
  p.foo('a');
  return 0;
}

Now, I know such code is silly, but the point I want to make is that it also will not compile (with a conforming compiler). Many people are surprised by this because they feel that clearly void foo(char) const should be selected in preference to void foo(int). They are wrong. The language rules require that the selected function from an overload set must be at least as good a match on each parameter, and better on at least one. The problem is that, as a member function, foo has two parameters in each case. Even though the conversion from plain to const is about as trivial as you can get, it is still not a pure identity (though one part of the C++ Standard actually describes it as an identity conversion). Now, if you look back at that code you will see that void foo(int) is a better match for the implicit parameter while void foo(char) const is a better match on the explicit parameter. A conforming compiler has no choice but to declare 'ambiguity'.

If you overload member functions with different explicit parameters, you will almost certainly need to provide non-const versions of any that are const member functions. The non-const version can usually delegate to the const version.

d) The subscript operator

The subscript operator is a very special case. The problem is that its use will usually expose a detail of the implementation. The const version:

sometype const & operator[](keytype) const;

is not a great problem. You should always remember that the reference might be invalidated by some subsequent call of another member function, but this is a general rule. The only way that I know to protect against that kind of abuse is to return by value, the language does not provide support for restricting behaviour of reference types (if you know different, please write an article on the subject). On the other hand we can do something to handle potential problems generated by the non-const version. We want to prevent the information leaking out into the program as a whole. Basically there are two cases to deal with. The first is just that if the object is non-const it will use the non-const operator[]. That is fine if we want to use it as an rvalue (i.e. just as a pure value) because if that is all we wanted we could delegate to the const version. However, users will also want to use the return as an lvalue, in other words they will want to assign to it. Somehow we need to capture and control assignment to a subscripted object. I need to support both:

sometype s;
s = container[n];

and

container[n] = s;

It is the second that causes the problem. Fortunately there is a perfectly good idiom to solve this problem in a way that provides just enough access for the assignment. The following is the merest sketch:

class container {
public:
  class ref {
  public:
    ref(sometype *);
    void operator= (sometype const &);
    operator sometype const & ();
  private:
    sometype * data_ptr;
  };
  ref & operator[](keytype); 
... // other details
};

Now when you use the subscript operator on a non-const instance of container you will get a writable reference to a container::ref back. This gives us the chance to grab and manage the operator=() and make it do the right thing. Anyone who tries to hang on to that by having some variable of type container::ref (or a pointer or reference to same) will find that there is very little they can do with it. Well that is too restrictive, because that would get in the way of using the subscript operator as an rvalue. This is the reason for that conversion operator in container::ref. Note that that conversion operator has another interesting feature in that it blocks any further implicit conversions with user defined conversion operators for sometype, as well as any use of converting constructors that take a sometype argument.

Would it be useful to have the operator[]()const return a container::ref & const? I am not sure and I have run out of time.

Conclusion

The purpose of this article is primarily to start the ball rolling. There are many idioms, both techniques and useful customs that expert C++ programmers use and assume are obvious. We need to record these for the benefit of the multitudes of good and aspiring programmers who have not come across them. Delve away into how you write code and ask yourself just how obvious your idioms are. Now write them up, in the first instance to share them with others, but secondly to expose them to the critical eye of others. The important thing is not just that I can teach you something but that you can teach me. Until I tell you what I know or believe, you can neither learn from me nor correct my mistakes.

If you find yourself doing something because you trust my authority, you are definitely wrong. That last sentence applies whoever you are and whoever I am. Under pressure of time, trust me, but when you have more time, challenge my thinking - you owe it to both of us.

Notes: 

More fields may be available via dynamicdata ..