Journal Articles
Browse in : |
All
> Journals
> Overload
> o125
(8)
All > Topics > Programming (877) Any of these categories - All of these categories |
Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.
Title: I Like Whitespace
Author: Martin Moene
Date: 04 February 2015 17:09:46 +00:00 or Wed, 04 February 2015 17:09:46 +00:00
Summary: Programming style can cause endless arguments. Bob Schmidt shares why he thinks whitespace matters.
Body:
This article contains sarcasm and (possibly failed) attempts at humor. My sarcasm font is broken, so you’re just going to have to recognize those sections without any help from me.
I know that nothing attracts more heat than light than discussions about programming style. Where do the braces go; how many spaces to indent; in C++ where should the const
keyword be placed. We all have opinions on all of these subjects, and more. Most aspects of programming style are matters of taste, and taste is subjective. My style is, of course, the only correct one.
I use a lot of whitespace when writing code. From reviewing and maintaining a lot of code written by others I know that this is not a common practice. I think it is an issue of weak thumbs and pinky fingers. Later I’ll suggest exercises for strengthening these digits so that inserting additional whitespace (in the form of spaces, carriage returns and (temporary) tabs) will become easier.
Motivation
Back in the mid-1980s I had the misfortune of working on a system that used RATFOR, a language created by Brian Kernighan that is a pre-compiler for FORTRAN 66 [Kernighan76]. RATFOR provides modern programming structure syntax to FORTRAN 66, ostensibly making it easier to write good code than straight FORTRAN.
A code beautifier was part of the RATFOR software tool set. Running a source file through the beautifier generated a hard-copy listing of the source, with indentations based on the source syntax. It was an OK tool for the time, although it didn’t always represent the true structure of the code correctly.
Unfortunately, the presence of the beautifier gave some of the original programmers on the project ‘permission’ to not structure their source code. Open a source file on a programmer’s terminal, and every line started in the far left-hand column. (See Listing 1.)
IF(A<B) THEN [ IF(B<C) THEN [ CALL X(A) ] ELSE [ CALL Y(B) ] ENDIF ] ELSE [ CALL Z(C) ] ENDIF |
Listing 1 |
This was perhaps the ugliest code I have ever had the misfortune to work on. I would completely format the entire source file prior to trying to fix or enhance the code. (I have not yet recovered from the trauma, in spite of all the software conference ‘therapy’ I’ve attended in the intervening years.)
It was working on this ugly RATFOR code that first got me really thinking about coding style. When I first started programming, back in the dark ages of 1981, any resemblance my code had to a style was strictly an accident. I like to think my coding style has evolved a lot since then. Today I prefer a style that showcases the code and its structure, so I don't have to go looking for it.
Spaces
Iwouldn’treadalineoftextinanarticleifitwaswrittenlikethis. So why should I have to read a line of code formatted like this:
for(int i=0;i<someMaxValue;++i)
I believe it is so much easier to read if it is written more like this:
for ( int i = 0; i < some_max_value; ++ i )
Note the extra spaces: on either side of the opening parenthesis; on both sides of the assignment and less than operators; and multiple spaces after the semicolons. The multiple spaces after the semicolons make each of the sections of the for statement stand out.
For more complex cases, separate the three sections of a for
statement to separate lines (see Listing 2).
for ( int i = 0, j = 0; i < some_max_value && j < some_other_max_value; ++ i, ++ j ) |
Listing 2 |
Long iterator type names can lead to long for
statements; the multi-line version of the for
statement is particularly useful there. (Note that the new auto
keyword should help alleviate some of those problems.)
I see a lot of code where an array index is itself an index into an array – sometimes to a third level. (See first line in Listing 3.)
array1[array2[array3[index3].index2].index1] x = array1[ array2[ array3[ index3 ].index2 ].index1 ] |
Listing 3 |
I find this difficult to parse mentally. When writing array subscripts, I put a space after the opening bracket and one before the closing bracket (see second line in Listing 3).
Admittedly, it is clearer to pull the nested indices out into their own variables, but I still like to put spaces around the index:
index2 = array3[ index3 ].index2; index1 = array2[ index2 ].index1; x = array1[ index1 ];
For complex Boolean expressions I tend to use extra spaces to separate sub-expressions:
if ( ( x < 0 ) || ( y > 0 ) )
At times I’ll use extra spaces instead of parenthesis:
if ( x < 0 || y > 0 )
I find either of the previous versions easier to parse than either of these options:
if((x<0)||(y>0))
or
if(x<0||y>0)
If you continue reading you will notice that I also put a space after a function name and before the opening parenthesis of the parameter list. This is a holdover from my FORTRAN days (which weren’t that long ago) when I used that space to differentiate between a function name (with the space) and an array name (without the space).
X = SINE ( ANGLE ) Y = ARRAY( ELEMENT )
Blank lines
Like spaces, I use blank lines liberally in my code. I use a blank line to separate variable declarations from other statements in a function, particularly in C where the declarations typically occur at the beginning of a function, or the beginning of a block. I also use a blank line to visually separate different steps in the code, such as the setup of a function call and the call itself.
message.header = TYPE; message.body = contents; status = send_message ( &message );
A line with nothing but a curly brace is a blank line for this purpose.
if ( something ) { do_something_else (); }
Putting the brace on the line after the if
statement provides visual separation for the then clause.
Alignment
I find it easier to read code when certain aspects of the code are aligned. Consider variable definitions:
int x=0; short longer_name=1; char short_name[2]={0,0};
I find this style easier to follow:
int x = 0; short longer_name = 1; char short_name[ 2 ] = { 0, 0 };
Aligning types, names, equal signs, and initial values makes each section of a definition stand out. The example here is trivial, but recently I’ve been working on some legacy C code with functions that have 20 or more definitions at the beginning of each function. My eyes started to water when I had to find a particular entry in the list. (It’s fixed, now.)
When making multiple assignments, I like to align the equals sign (see Listing 4).
structure.element_the_first = 1; structure.second_element = 2; structure.third_element = function_that_returns_the_needed_value (); |
Listing 4 |
Complex Boolean expressions are a place where spacing and alignment can come into play at the same time.
Who can look at this mess and easily tell what’s going on?
if(((x==0)&&(x==1))||((y==3)&&((z==5)||(z==6))))
Using a combination of spacing and alignment makes it a little easier to parse mentally:
if ( ( ( x == 0 ) && ( x == 1 ) ) || ( ( y == 3 ) && ( ( z == 5 ) || z == 6 ) ) ) )
Camel case
I don’t like writing or reading names in camel case. iWouldn’tReadTextIfItWasWrittenLikeThis, so why should I have to read variable and function names written this way? We can’t embed spaces in names (unless you’re still writing in FORTRAN, which deletes whitespace prior to parsing), so the best we can do is use the underscore. some_max_value
is easier to read than SomeMaxValue
, and the longer and more descriptive the name the easier it is.
Braces and indentation
I used to think that three-space indentation was just about perfect. One or two spaces aren’t enough visually, and four or more pushed code way too far to the right, particularly on old 80-character wide TTY terminals.
if ( x == 0 ) { process_x_when_0 (); } else { process_x_when_not_0 (); }
Note that using three space indentation, and indenting the braces, causes the if
and else
blocks to line up under the opening parenthesis of the if
statement (assuming you put a space between the if
and the opening parenthesis). I found this visually appealing, and I wrote a lot of code that way.
I’ve since adopted a four space indentation, with the braces lined up with the if
and else
keywords, for one major reason – it is the common indentation used by my major customer (they have no standards), and it is easier in this case to adapt than be the odd man out. Plus, taking the advice of Bob Martin, I don’t write functions with large, heavily indented if
-then
-else
trees anymore [Martin09]. (But note that Bob is against the aligning rules I use, so he’s not right about everything.)
if ( x == 0 ) { process_x_when_0 (); } else { process_x_when_not_0 (); }
I really don’t like this style:
if ( x == 0 ) { process_x_when_0 (); } else { process_x_when_not_0 (); }
I like putting my opening and closing braces on separate lines, for two reasons: they are always easy to find, and the line with nothing on it but the brace provides a line of whitespace around the statements that make up the then
and else
clauses.
MISRA standards [MISRA13] require that braces be used around if
and else
clauses even if there is only one statement in the clause (as in each of the three previous examples). I try not to write code that looks like this:
if ( x == 0 ) process_x_when_0 (); else process_x_when_not_0 ();
The standard is meant to eliminate problems introduced in maintenance:
if ( x == 0 ) process_x_when_0 (); else process_x_when_not_0 (); process_something_else (); // ALWAYS EXECUTED
Or with nested if
statements (I have seen an error of this type just recently):
if ( x == 0 ) if ( y == 0 ) process_x_and_y_are_0 (); else process_x_is_not_0 ();
This only works the way you want it to with properly applied braces:
if ( x == 0 ) { if ( y == 0 ) { process_x_and_y_are_0 (); } } else { process_x_is_not_0 (); }
I admit that I do, sometimes, omit the braces, based on the context of the code. I’m working on forcing myself to always include them.
Editor's note: The two listings of above example have been corrected with respect to the article in the printed magazine.
Tabs
If you don’t want to over-work your thumbs hitting the space bar, by all means use the tab key, but please, set your editor to replace tabs with spaces. Two things frustrate me when it comes to the tab character remaining in code. I have to figure out what tab setting was used to begin with (seconds of my life I will never get back), and if I’m editing a line and delete a tab I now have to re-add spaces I may not have wanted to delete.
const
Where, exactly, should const
be placed in a definition or declaration?
The most common placement of const
is (was?) as in Listing 5.
const int i; // i is a const int int* const p1; // p1 is a const ptr to int const int* p2; // p2 is a ptr to a const int const int* const p3; // p3 is a const ptr to a const int |
Listing 5 |
Dan Saks is an advocate of placing const
in a way that makes it easy to read the declaration, when read from right to left [Saks88] (see Listing 6).
int const i; // i is a const int int* const p1; // p1 is a const ptr to int int const* p2; // p2 is a ptr to a const int int const* const p3; // p3 is a const ptr to a const int |
Listing 6 |
In a conversation with Dan, I mentioned that I was using the other style, because in general I didn’t find it difficult to parse mentally. Dan explained to me that I wasn’t the person I should be writing it for, and I’ve used his style ever since. (I can be persuaded.)
So what does this have to do with whitespace? As you can see, in either style I like to line up the const
keywords in a block of declarations. I think it’s an important part of a declaration, and the alignment makes the const
(or lack of it) pop out. (And line up the comments, too; it makes them easier to read.) I treat the volatile
keyword the same way.
Function parameters
I write function prototypes like this:
int function_name ( int first_parameter, double second_paramter, CLASS_NAME third_parameter );
When calling a function I will format it in a similar way. Having each parameter on a separate line makes them easy to distinquish from one another, and it is easy to add a comment to a parameter (which I find particularly useful when calling one of Microsoft’s multi-parameter SDK functions):
int result = function_name ( first_parameter, second_parameter, class_parameter );
There are some exceptions. For functions with only two parameters, I tend to put them on the same line as the function name; two parameters are not hard to pick out on one line (Listing 7).
int result = function_name ( first_parameter, second_parameter ); |
Listing 7 |
I do the same thing for standard library and STL functions and member functions, regardless of the number of parameters, because the parameter lists for these functions tend to be well known.
In his presentation ‘Seven Ineffective Coding Habits of Many Programmers’, Kevlin Henney [Henney14] talks about “unsustainable spacing†(approximately minute 24), and rejects the idea of certain styles of aligning code as being unmaintainable. He makes the point that maintaining certain styles (mine being one) is “doomed to failure†unless the code never changes. He uses as an example changing the name of a function.
Changing the name of the function breaks the alignment (see Listing 8).
int not_aligned_like_this ( int first_parameter, String second_parameter ) int new_function_name ( int first_parameter, short second_parameter ) |
Listing 8 |
Kevlin suggests the style in Listing 9, or something similar, because performing a refactoring such as changing the name doesn’t break the alignment.
int original_function_name ( int first_parameter, short second_parameter ) int new_function_name ( int first_parameter, short second_parameter ) |
Listing 9 |
I see Kevlin’s point, but I’m not … wait, hold the presses.
I have had a rough time with this section on function parameter placement. As I said in my opening note, this is nothing but an opinion piece, and my goal was to document the way I do things, give an explanation as to why I do it that way, and leave it up to you, dear reader, to ignore what I have to say and do it the way you want. That’s fine; I don’t really expect to sway many opinions. (Maybe one person? Anybody? <crickets>)
The problem occurred when I tried to pick a rhetorical fight with Kevlin with regard to placement and alignment of function parameters. I really don’t like the style Kevlin advocates in his presentation. There is something about starting the parameter list on the line after the function call I find aesthetically unpleasing. I tried five or six times to justify using my style, and not his, but I kept running into this wall.
With all due respect to Kevlin, just how often is a function name changed over a large code base? I don’t work on any code base that easily allows that type of change. Changing a widely used function name isn’t something to be taken lightly, and if you do it you should be prepared to do it correctly. (OK, Bob, but what are you going to do when you start working on a code base that does allow those types of changes, and actively encourages them?)
I see Kevlin’s point, but using the shortcomings of our current tool set to advocate use of such an ugly style is wrong. (Compromise isn’t a dirty word, Bob.)
And on it went.
In his presentation Kevlin asks a rhetorical question, “Why do you choose an approach that is difficult to maintain?†I do it because I feel it is the right thing to do. I truly believe if we’re not actively improving our code we are passively making it worse. I consider it part of doing a good job, and nobody said it would be easy.
Then Kevlin followed up with “Do your colleagues also do what you do?†Sadly, the answer is no. And that is where I kept getting tripped up in my counter-argument.
So, after all of the false starts I realized that I do see Kevlin’s point, and most of my justifications boiled down to “this is the way I have always done it, I like it that way, and I don’t want to changeâ€. Not exactly the end to the rhetorical fight I envisioned when I started this section.
Now the question is, will I actually change the way I write function prototypes and calls? The answer is, I don’t know. I will try to come up with a style that satisfies my need for readability and consistency, and Kevlin’s need for sustainability. It's the aesthetic issue that is going to be the problem. (Cue cognitive disconnect in five, four, three…)
Conclusion
Most of us, if not all of us, spend a lot more time reading code than writing it. My chosen style isn’t more efficient on the writing end; on the contrary, I would say that it takes a bit more time. It is not easier to write code in this style – I would say that it requires a bit more discipline. I believe the payoff occurs when the code is read, by myself or by others.
Thirty years ago, when hard drives were expensive, programmers were cheap, and programmer’s terminals had 23 lines and 80 columns, it might have made sense to use the fewest number of characters possible to implement code. Here we are in 2015, where terabyte drives are almost trivially cheap, programmers are expensive, and we are no longer using TTYs. Making use of whitespace to make code easier to read for ourselves and our colleagues makes sense – to me, at least.
Acknowledgements
As always, my thanks to Fran and the reviewers. You read this and published it anyway. And a special thank you to Kevlin Henney, who didn’t volunteer for the rhetorical fight, but easily won it, anyway.
References
[Henney14] ‘Seven Ineffective Coding Habits of Many Programmers’, Henney, Kevlin, NDC 2014 http://vimeo.com/97329157
[Kernighan76] Software Tools, Kernighan, Brian and Plauger, P.J., Addison-Wesley Professional, 1976
[Martin09] Clean Code, Martin, Robert C., Prentice Hall, 2009
[MISRA13] ‘Guidelines for the use of the C language in critical systems’, MISRA, http://www.misra-c.com/
[Saks88] ‘Placing const in Declarations’, Saks, Dan, Embedded Systems Programming, June 1988
Exercises
‘Pinky Finger Workout’, typingweb, https://www.typingweb.com/tutor/lesson/index/id/357/
‘Thumb Exercises: Active Motion’, Northwestern Memorial Hospital, http://www.nmh.org/ccurl/275/700/thumb-exercises-active.pdf
Notes:
More fields may be available via dynamicdata ..