Programming Topics + CVu Journal Vol 30, #3 - July 2018
Browse in : All > Topics > Programming
All > Journals > CVu > 303
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: Everyday Coding Habits for Safety and Simplicity

Author: Bob Schmidt

Date: 05 July 2018 16:54:46 +01:00 or Thu, 05 July 2018 16:54:46 +01:00

Summary: Arun Saha has some simple advice for forming good habits in C++.

Body: 

A habit is a choice that we deliberately make at some point, and then stop thinking about, but continue doing, often every day [1].

A lot of times, minor differences in code can lead to significantly better code. In this article, let us go together through a few such cases which we encounter in our everyday coding life. None of these is revolutionary (in fact, you may have seen them before); rather this is a collection of suggestions how small changes – a few characters in most cases – can make big differences. [The examples in this article are in C++; a lot of them apply in C as well.]

Always initialize automatic variables

Do not define uninitialized variables. For example, don’t do this:

  int result;

This defines the variable but it is not initialized to any known value. (It will have an indeterminate value.) I have seen a lot of bugs arise for this reason alone. Often such bugs stay hidden for a long time and express themselves one fine morning. For example, this may happen when the compiler or the compiler version is changed.

Instead, initialize it to some known value:

  int result = 0;

The exact value to be initialized is not important; choose something that is appropriate to the context.

Note that this is different from accessing (reading) uninitialized variables (which can be detected by the compiler warning option -Wuninitialized). Here, the recommendation is to never leave a defined variable uninitialized, even if it is not being accessed.

As you might have realized, this applies only to automatic (or local) variables of the primitive types like char, int, double with all possible signedness and width, float, and bool. If T is a type with sane default constructor (such as std::string), then

  T result;

is not a problem.

One of the reasons for this behavior is that the primitive types do not have constructors. Despite that, a compiler may choose to emit an instruction for ‘zero’ initialization. However, with C++ and C compilers – together with their users – being very performance sensitive, such (initialization) instructions are not usually emitted since they are not required by the standard.

[Would it make sense for compilers to provide a knob such as -fzero-initialize-primitives?]

Prefer defining variables only when needed

Don’t define a variable too early only to assign it later. For example, don’t do this:

  int result = 0;
  // many
  // intermediate
  // lines
  result = ComputeSomething(input);

Rather, define and initialize it together at the point it is used.

  int result = ComputeSomething(input);

If the variable is not expected to be modified thereafter, then mark it constant as:

  int const result = ComputeSomething(input);

In fact, try organizing the flow of logic and code such that it is a series of const variables. It not only prevents inadvertent modification of the variable, it also makes it easy for a reader to follow the flow; it will be one less thing to track in her working memory. It is considered that the number of objects that a human can hold in working memory is 7 ± 2 [2]. I try to be safe and be on the lower side!

Prefer input arguments to be immutable

The safety argument applies to function arguments too. Prefer making the function arguments const as much as possible. If the method is just reading the data without modifying, then the data must be const. So, instead of

  void PrintInorderTraversal(TreeNode * root);

prefer

  // Declaration in .h
  void PrintInorderTraversal(
    TreeNode const * root);

I recommend going one step further by marking the input variable const. This can be done in the implementation file without putting it in the header file. This const prevents the local variable pointing to the data from accidentally getting modified and starting to point to different data.

  // Definition in .cc
  void PrintInorderTraversal(
    TreeNode const * const root);

The recommendation applies equally well for pass-by-reference arguments. There, instead of

  bool IsValidIpAddress(std::string& input);

prefer

  bool IsValidIpAddress(std::string const& input);

Accessor member functions must be const

In C++, member functions can be const when they are not modifying the object. Prefer making the member functions const whenever possible. For example, if a method, such as Print(), is not modifying the object, then instead of

  struct Foo {
    <stuff>
    void Print();
    <more stuff>
  };

prefer writing it as

  struct Foo {
    <stuff>
    void Print() const;  // Note: 'const'

    <more stuff>
  };

As you know, if a member variable is marked mutable, then even a const qualified method can modify it. A mutable variable is appropriate only in some specific circumstances (e.g. a mutex, a hit counter); don’t overuse it.

Prefer default member initialization

Member variables of primitive types are initialized in the constructor. Usually, the constructor is in a different file and there is potential for the initialization to be missed. This tends to happen less during initial creation but more during later enhancements and maintenance. To prevent that, do not defer the initialization until the constructor; instead, prefer to initialize them inline in the class body.

Consider the following snippet.

  // .h file
  struct Foo {
    explicit Foo(int64_t);
    <stuff>
    int64_t fooId_; // Initialization deferred to 

                    // ctor Potential to be missed
  }

  // .cc file
  Foo::Foo(int64_t fooId) : fooId_{fooId} {}

For safety, I recommend writing it as the following using the default member initialization feature [3].

  struct Foo {
    <stuff>
    int64_t fooId_{0}; // Known initialization if

                       // missed in .cc constructor
  }
  Foo::Foo(int64_t fooId) : fooId_{fooId} {}

Except for a few types like bit-fields and non-length-specified arrays, this can be used for most of the data types. Note that, if a member variable has a default member initializer and also appears in the member initialization list in a constructor, the default member initializer is ignored.

You may have noticed that the concept here is similar to an item above except that the variable type is different: automatic variable vs. member variable. In both cases, the motivation is to ensure that a variable is always initialized to a known value. Instances of such missing initializations may be detected by g++ with the warning option -Weffc++ [4].

Prefer immutable member variables if you can

For class members which are initialized at construction and never change during the lifetime of the object, prefer marking them const. Consider the following example where the object carries the time instant when it was constructed.

  using TimePoint = std::chrono::time_point
    <std::chrono::system_clock>;
  inline TimePoint
  GetTimePointNow() {
    return std::chrono::system_clock::now();
  }

Instead of the following

  // .h file
  struct Foo {
    <stuff>
    int64_t   fooId_{0};
    TimePoint  creationInstant_;
  }
  // .cc file
  Foo::Foo(int64_t fooId) :
    fooId_{fooId},
    creationInstant_{GetTimePointNow()} {}

prefer doing

  // .h file
  struct Foo {
    <stuff>
    int64_t const   fooId_{0};
    TimePoint const creationInstant_; // Note const

  }

The motivation is similar to marking automatic variables const; safety from accidental modification and reduced burden of information tracking.

Prefer APIs to provide the result as a return value

Don’t design an API to update a reference or a pointer to the result, let it return the result. For example, instead of doing

  int result = 0;
  ComputeResult(n, result);   // pass by reference

or

  int result = 0;
  ComputeResult(n, &result);  // pass by pointer

prefer doing

  int const result = ComputeSomething(n);

This style allows the returned value to be saved in a const variable and to be used in the subsequent computation. This certainly applies to objects that are trivially copyable, such as the primitive types. For objects that are not trivially copyable, it can still be applicable thanks to:

  1. Return Value Optimization (RVO)
  2. Copy Elision
  3. Move semantics

Note however that the rules of the above are quite involved and, in certain situations, they may not kick in. So, if you want to play safe and avoid it for non-trivially copyable types, that’s understandable.

Prefer for-loops over while-loops

Instead of writing as a while loop, prefer writing loops as for loops whenever possible. This is not just a bias towards one keyword over another; this enables easier comprehension of loop invariants. For example, instead of the following

  int i = 0;
  while (i < n) {
    <possibly multiple lines of loop body>
    i++;
  }

prefer writing it as

  for (int i = 0; i < n: ++i) {
    <loop body>
  }

In addition, this style has another advantage: the loop variable (here, i) is not ‘leaked’ to the outer scope unnecessarily.

The recommendation goes for linked list traversal too. Here, instead of

  ListNode const * curr = head;
  while (curr) {
    <possibly multiple lines of loop body>
    curr = curr->next;
  }

prefer writing it as

  for (ListNode const * curr = head; curr; 
    curr = curr->next) {
    <loop body>
  }

As if the above is not enough, the above applies to traversing from both ends. So, instead of

  void Reverse(T * arr, int64_t const n) {
  int64_t left = 0;
    int64_t right = n - 1;
    while (left < right) {
      swap(arr[left], arr[right]);
      left++;
      right--;
    }
  }

prefer writing it as

  void Reverse(T * arr, int64_t const n) {
    for (int64_t left = 0, right = n - 1;
    left < right; ++left, --right) {
      swap(arr[left], arr[right]);
    }
  }

Note that although it is a by product, saving the number of lines is not the objective.

Prefer combining boolean conditions into a well-named variable

If the program logic requires evaluating multiple conditions, then instead of clamping them together in the conditional of an if statement, for example, instead of

  if (someCondition(input) &&
    anotherCondition(input) &&
    !yetAnotherCondition(input)) {
      <body>

prefer

  bool const input_valid = someCondition(input) &&
    anotherCondition(input) &&
    !yetAnotherCondition(input);
  if (input_valid) {
    <body>

This has two benefits: the evaluated condition gets a human-readable name and it becomes easier to view what the condition evaluated to (i.e. true or false) using a log message or a debugger.

Prefer separate asserts for separate conditions

If you use asserts in your source code to validate assumptions, then do not combine multiple conditions into a single assert statement.

  void Copy(char * dst, char const * src, 
    size_t const length) {
      assert(dst && src); // 2 conditions combined

    <continued>

If you do so and the assertion fails, then the generated message would mention that the combined condition has failed (along with some meta information like filename and line number). The person who has to debug the problem (it could be you at a future time) would not be clear exactly why the assertion failed. Instead, break the single assertion into multiple simpler assertions.

  void Copy(char * dst, char const * src, 
    size_t const length) {
      assert(dst);
      assert(src);
      <continued>

Now, if the assertion fails, it would be obvious what exactly has failed!

Summary

The table below provides an at-a-glance view of the above recommendations.

Avoid doing Prefer doing
1
int result;
int result = 0;
2
int result = 0;int
result = ComputeSomething(input);
int const result =
ComputeSomething(input);
3
void PrintInorderTraversal(
   TreeNode * root);
bool IsValidIpAddress(
   std::string& input);
void PrintInorderTraversal(
   TreeNode const * const root);
bool IsValidIpAddress(
   std::string const& input);
4
void Foo::Print();
void Foo::Print() const;
5
int64_t fooId_;
int64_t fooId_{0};
6
TimePoint creationInstant_;
TimePoint creationInstant_;
7
ComputeResult(n, &result);
int const result = ComputeSomething(n);
8
int i = 0;
while (i < n) {
  <loop body>
  i++;
}
for (int i = 0; i < n: ++i) {
  <loop body>
}
9
if (someCondition(input) &&
  anotherCondition(input) &&
  !yetAnotherCondition(input)) {
bool const input_valid =
   someCondition(input) &&
   anotherCondition(input) &&
   !yetAnotherCondition(input);

if (input_valid) {
10
assert(dst && src);
assert(dst);
assert(src);

Software engineering is an eternal fight with complexity. The recommendations here are based on my experience as a software engineer. Over time, I have found them useful in constructing durable code. Unless your situation precludes, I recommend trying them out. Once they become a habit [1], you would cease thinking about them and focus on other important aspects of your code.

References

[1] http://charlesduhigg.com/how-habits-work/

[2] https://en.wikipedia.org/wiki/The_Magical_Number_Seven,_Plus_or_Minus_Two

[3] https://en.cppreference.com/w/cpp/language/data_members

[4] https://stackoverflow.com/questions/2099692/easy-way-find-uninitialized-member-variables

Notes: 

More fields may be available via dynamicdata ..