Journal Articles

CVu Journal Vol 30, #3 - July 2018 + Design of applications and programs

Browse in :

All > Journals > CVu > 303 (9)
All > Topics > Design (236)
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: Write Less Code!

Author: Bob Schmidt

Date: 08 July 2018 17:21:01 +01:00 or Sun, 08 July 2018 17:21:01 +01:00

Summary: Pete Goodliffe helps us avoid unnecessary lines of code.

Body:

A well-used minimum suffices for everything.
~Jules Verne, Around the World in Eighty Days

Itâ€™s sad, but itâ€™s true: in our modern world thereâ€™s just too much code.

I can cope with the fact that my car engine is controlled by a computer. Thereâ€™s obviously software cooking the food in my microwave. And it wouldnâ€™t surprise me if my genetically modified cucumbers had an embedded microcontroller in them. Thatâ€™s all fine; itâ€™s not what Iâ€™m obsessing about. Iâ€™m worried about all of the unnecessary code out there.

Thereâ€™s simply too much unnecessary code kicking around. Like weeds, these evil lines of code clog up our precious bytes of storage, obfuscate our revision control histories, stubbornly get in the way of our development, and use up precious code space, choking the good code around them.

Why is there so much unnecessary code?

Some people like the sound of their own voice. Youâ€™ve met them; you just canâ€™t shut them up. Theyâ€™re the kind of people you donâ€™t want to get stuck with at parties. Yada yada yada. Other people like their own code too much. They like it so much they write reams of it:

  { yada->yada.yada(); }

Or perhaps theyâ€™re the programmers with misguided managers who judge progress by how many thousands of lines of code have been written a day.

Writing lots of code does not mean that youâ€™ve written lots of software. Indeed, some code can actually negatively affect the amount of software you have â€“ it gets in the way, causes faults, and reduces the quality of the user experience. The programming equivalent of antimatter.

Less code can mean more software.

Some of my best software improvement work has been by removing code. I fondly remember one time when I lopped thousands of lines of code out of a sprawling system, and replaced it with a mere 10 lines of code. What a wonderfully smug feeling of satisfaction. I suggest you try it some time.

Why should we care?

So why is this phenomenon bad, rather than merely annoying?

There are many reasons why unnecessary code is the root of all evil. Here are a few bullet points:

Writing a fresh line of code is the birth of a little life form. It will need to be lovingly nurtured into a useful and profitable member of software society before you can release a product using it.
Over the life of your software system, that line of code needs maintenance. Each line of code costs a little. The more code you write, the higher the cost. The longer a line of code lives, the higher its cost. Clearly, unnecessary code needs to meet a timely demise before it bankrupts us.
More code means there is more to read and more to understand â€“ it makes our programs harder to comprehend. Unnecessary code can mask the purpose of a function, or hide small but important differences in otherwise similar code.
The more code there is, the more work is required to make modifications â€“ the program is harder to modify.
Code harbours bugs. The more code you have, the more places there are for bugs to hide.
Duplicated code is particularly pernicious; you can fix a bug in one copy of the code and, unbeknown to you, still have another 32 identical little bugs kicking around elsewhere.

Unnecessary code is nefarious. It comes in many guises: unused components, dead code, pointless comments, unnecessary verbosity, and so on. Letâ€™s look at some of these in detail.

Flappy logic

A simple and common class of pointless code is the unnecessary use of conditional statements andÂ tautological logic constructs. Flappy logic is the sign of a flappy mind. Or, at least, of a poor understanding of logic constructs. For example:

  if (expression)
  Â  return true;
  else
  Â  return false;

can more simply, and directly, be written:

  return expression;

This is not only more compact, it is easier to read, and therefore easier to understand. It looks more like an English sentence, which greatly aids human readers. And do you know what? The compiler doesnâ€™t mind one bit.

Similarly, the verbose expression:

  if (something == true)
  {
  Â  // ...
  }

would read much better as:

  if (something)

Now, these examples are clearly simplistic. In the wild we see much more elaborate constructs created; never underestimate the ability of a programmer to complicate the simple. Real-world code is riddled with things like Listing 1

bool should_we_pick_bananas()
{
Â  if (gorilla_is_hungry())
Â  {
Â  Â  if (bananas_are_ripe())
Â  Â  {
Â  Â  Â  return true;
Â  Â  }
Â  Â  else
Â  Â  {
Â  Â  Â  return false;
Â  Â  }
Â  }
Â  else
Â  {
Â  Â  return false;
Â  }
}

Listing 1

which reduces neatly to the one-liner:

  return gorilla_is_hungry() && bananas_are_ripe();

Cut through the waffle and say things clearly, but succinctly. Donâ€™t feel ashamed to know how your language works. Itâ€™s not dirty, and you wonâ€™t grow hairy palms. Knowing, and exploiting, the order in which expressions are evaluated saves a lot of unnecessary logic in conditional expressions. For example:

  if ( a
Â  Â  || (!a && b) )
  {
  Â  // what a complicated expression!
  }

can simply be written:

  if (a || b)
  {
Â  Â  // isn't that better?
Â  Â  // didn't hurt, did it?
  }

Express code clearly and succinctly. Avoid unnecessarily long-winded statements.

Duplication

Unnecessary code duplication is evil. We mostly see this crime perpetrated through the application of cut-and-paste programming: when a lazy programmer chooses not to factor repeated code sections into a common function, but physically copies it from one place to another in their editor. Sloppy. The sin is compounded when the code is pasted with minor changes.

When you duplicate code, you hide the repeated structure, and you copy all of the existing bugs. Even if you repair one instance of the code, there will be a queue of identical bugs ready to bite you another day. Refactor duplicated code sections into a single function. If there are similar code sections with slight differences, capture the differences in one function with a configuration parameter.

Do not copy code sections. Factor them into a common function. Use parameters to express any differences.

This is commonly known as the DRY principle: Donâ€™t Repeat Yourself! We aim for â€˜DRYâ€™ code, without unnecessary redundancy. However, be aware that factoring similar code into a shared function introduces tight coupling between those sections of code. They both now rely on a shared interface; if you change that interface, both sections of code must be adjusted. In many situations this is perfectly appropriate; however, itâ€™s not always a desirable outcome, and can cause more problems in the long run than the duplication â€“ so DRY your code responsibly!

Not all code duplication is malicious or the fault of lazy programmers. Duplication can happen by accident too, by someone reinventing a wheel that they didnâ€™t know existed. Or it can happen by constructing a new function when a perfectly acceptable third-party library already exists. This is bad because the existent library is far more likely to be correct and debugged already. Using common libraries saves you effort, and shields you from a world of potential faults.

There are also microcode-level duplication patterns. For example:

  if (foo) do something();
  if (foo) do_something_else()
  if (foo) do_more();

could all be neatly wrapped in a single if statement. Multiple loops can usually be reduced to a single loop. For example, the code in Listing 2

for (int a = 0; a < MAX; ++a)
{
Â  // do something
}
// make hot buttered toast
for (int a = 0; a < MAX; ++a)
{
Â  // do something else
}

Listing 2

probably boils down to:

  for (int a = 0; a < MAX; ++a)
  {
Â  Â  // do something
Â  Â  // do something else
  }
  // make hot buttered toast

if the making of hot buttered toast doesnâ€™t depend on either loop. Not only is this simpler to read and understand, itâ€™s likely to perform better, too, because only one loop needs to be run. Also consider redundant duplicated conditionals:

  if (foo)
  {
Â  Â  if (foo && some_other_reason)
Â  Â  {
Â  Â  Â  // the 2nd check for foo was redundant
Â  Â  }
  }

You probably wouldnâ€™t write that on purpose, but after a bit of maintenance work a lot of code ends up with sloppy structure like that.

If you spot duplication, remove it.

I was recently trying to debug a device driver that was structured with two main processing loops. Upon inspection, these loops were almost entirely identical, with some minor differences for the type of data they were processing. This fact was not immediately obvious because each loop was 300 lines (of very dense C code) long! It was tortuous and hard to follow. Each loop had seen a different set of bugfixes, and consequently the code was flaky and unpredictable. A little effort to factor the two loops into a single version halved the problem space immediately; I could then concentrate on one place to find and fix faults.

Dead code

If you donâ€™t maintain it, your code can rot. And it can also die. Dead code is code that is never run, that can never be reached. That has no life. Tell your code to get a life, or get lost.

Listings 3 and 4 both contain dead code sections that arenâ€™t immediately obvious if you quickly glance over them.

if (size == 0)
{
Â  // ... 20 lines of malarkey ...
Â  for (int n = 0; n < size; ++n)
Â  {
Â  Â  // this code will never run
Â  }
Â  // ... 20 more lines of shenanigans ...
}

Listing 3

void loop(char *str)
{
Â  size_t length = strlen(str);
Â  if (length == 0) return;
Â  for (size_t n = 0; n < length; n++)
Â  {
Â  Â  if (str[n] == '\0')
Â  Â  {
Â  Â  Â  // this code will never run
  Â  }
  }
  if (length) return;
  // neither will this code
}

Listing 4

Other manifestations of dead code include:

Functions that are never called
Variables that are written but never read
Parameters passed to an internal method that are never used
Enums, structs, classes, or interfaces that are never used

Comments

Sadly, the world is riddled with awful code comments. You canâ€™t turn around in an editor without tripping over a few of them. It doesnâ€™t help that many corporate coding standards are a pile of rot, mandating the inclusion of millions of brain-dead comments.

Good code does not need reams of comments to prop it up, or to explain how it works. Careful choice of variable, function, and class names, and good structure should make your code entirely clear. Duplicating all of that information in a set of comments is unnecessary redundancy. And like any other form of duplication, it is also dangerous; itâ€™s far too easy to change one without changing the other.

Stupid, redundant comments range from the classic example of byte wastage:

  ++i; <CodeComment>
// increment i</CodeComment>

to more subtle examples, where an algorithm is described just above it in the code:

  // loop over all items, and add them up
  int total = 0;
  for (int n = 0; n < MAX; n++)
  {
Â  Â  total += items[n];
  }

Very few algorithms when expressed in code are complex enough to justify that level of exposition. (But some are â€“ learn the difference!) If an algorithm does need commentary, it may be better supplied by factoring the logic into a new, well-named function.

Make sure that every comment adds value to the code. The code itself says what and how. A comment should explain why â€“ but only if itâ€™s not already clear.

Itâ€™s also common to enter a crufty codebase and see â€˜oldâ€™ code that has been surgically removed by commenting it out. Donâ€™t do this; itâ€™s the sign of someone who wasnâ€™t brave enough to perform the surgical extraction completely, or who didnâ€™t really understand what they were doing and thought that they might have to graft the code back in later. Remove code completely. You can always get it back afterwards from your source control system.

Do not remove code by commenting it out. It confuses the reader and gets in the way.

Donâ€™t write comments describing what the code used to do; it doesnâ€™t matter anymore. Donâ€™t put comments at the end of code blocks or scopes; the code structure makes that clear. And donâ€™t write gratuitous ASCII art.

Verbosity

A lot of code is needlessly chatty. At the simplest end of the verbosity spectrum (which ranges from infra-redundant to ultra-voluble) is code like this:

  bool is_valid(const char *str)
  {
Â  Â  if (str)
Â  Â  Â  return strcmp(str, "VALID") == 0;
Â  Â  else
Â  Â  Â  return false;
  }

It is quite wordy, and so itâ€™s relatively hard to see what the intent is. It can easily be rewritten:

  bool is_valid(const char *str)
  {
Â  Â  return str && strcmp(str, "VALID") == 0;
  }

Donâ€™t be afraid of the ternary operator if your language provides one; it really helps to reduce code clutter. Replace this kind of monstrosity:

  public String getPath(URL url) {
  Â  if (url == null) {
  Â  Â  return null;
  Â  }
  Â  else {
  Â  Â  return url.getPath();
  Â  }
  }

with:

  public String getPath(URL url) {
Â  Â  return url == null ? null : url.getPath();
  }

C-style declarations (where all variables are declared at the top of a block, and used much, much later on) are now officially passÃ© (unless youâ€™re still forced to use officially defunct compiler technology). The world has moved on, and so should your code. Avoid writing this:

  int a;
  // ... 20 lines of C code ...
  a = foo();
  // what type was an "a" again?

Move variable declarations and definitions together, to reduce the effort required to understand the code, and reduce potential errors from uninitialised variables. In fact, sometimes these variables are pointless anyway. For example:

  bool a;
  int b;
  a = fn1();
  b = fn2();
  if (a)
Â Â   foo(10, b);
  else
Â    foo(5, b);

can easily become the less verbose (and, arguably clearer):

  foo(fn1() ? 10 : 5, fn2());

Bad design

Of course, unnecessary code is not just the product of low-level code mistakes or bad maintenance. It can be caused by higher-level design flaws.

Bad design may introduce many unnecessary communication paths between components â€“ lots of extra data marshalling code for no apparent reason. The further data flows, the more likely it is to get corrupted en route.

Over time, code components become redundant, or can mutate from their original use to something quite different, leaving large sections of unused code. When this happens, donâ€™t be afraid to clear away all of the deadwood. Replace the old component with a simpler one that does all that is required.

Your design should consider whether off-the-shelf libraries already exist that solve your programming problems. Using these libraries will remove the need to write a whole load of unnecessary code. As a bonus, popular libraries will likely be robust, extensible, and well used.

Whitespace

Donâ€™t panic! Iâ€™m not going to attack whitespace (that is, spaces, tabs, and newlines). Whitespace is a good thing â€“ do not be afraid to use it. Like a well-placed pause when reciting a poem, sensible use of whitespace helps to frame our code.

Use of whitespace is not usually misleading or unnecessary. But you can have too much of a good thing, and 20 newlines between functions probably is too much.

Consider, too, the use of parentheses to group logic constructs. Sometimes brackets help to clarify the logic even when they are not necessary to defeat operator precedence. Sometimes they are unnecessary and get in the way.

So what do we do?

To be fair, often such a buildup of code cruft isnâ€™t intentional. Few people set out to write deliberately laborious, duplicated, pointless code. (But there are some lazy programmers who continually take the low road rather than invest extra time to write great code.) Most frequently, we end up with these code problems as the legacy of code that has been maintained, extended, worked with, and debugged by many people over a large period of time.

So what do we do about it? We must take responsibility. Donâ€™t write unnecessary code, and when you work on â€˜legacyâ€™ code, watch out for the warning signs. Itâ€™s time to get militant. Reclaim our whitespace. Reduce the clutter. Spring clean. Redress the balance.

Pigs live in their own filth. Programmers neednâ€™t. Clean up after yourself. As you work on a piece of code, remove all of theÂ unnecessary code that you encounter.

This is an example of how to follow Robert Martinâ€™s advice and honour â€˜the Boy Scout Ruleâ€™ in the coding world: Always leave the campground cleaner than you found it. [1]

Every day, leave your code a little better than it was. Remove redundancy and duplication as you find it.

But take heed of this simple rule: make â€˜tidying upâ€™ changes separately from other functional changes. This will ensure that itâ€™s clear in your source control system whatâ€™s happened. Gratuitous structural changes mixed in with functional modifications are hard to follow. And if there is a bug then itâ€™s harder to work out whether it was due to your new functionality, or because of the structural improvement.

Conclusion

Software functionality does not correlate with the number of lines of code, or to the number of components in a system. More lines of code do not necessarily mean more software.

So if you donâ€™t need it, donâ€™t write it. Write less code, and find something more fun to do instead.

Questions

Do you naturally write succinct logical expressions? Are your succinct expressions so terse as to be incomprehensible?
Does the C-language-familyâ€™s ternary operator (e.g., condition ? true_value : false_value) make expressions more or less readable? Why?
We should avoid cut-and-paste coding. How different does a section of code have to be before it is justifiable to not factor into a common function?
How can you spot and remove dead code?
Some coding standards mandate that every function is documented with specially formatted code comments. Is this useful? Or is it an unnecessary burden, introducing a load of worthless extra comments?

Reference

[1] Robert C. Martin (2008) Clean Code: A Handbook of Agile Software Craftsmanship, Upper Saddle River, NJ: Prentice Hall.

Notes:

More fields may be available via dynamicdata ..