Journal Articles

CVu Journal Vol 14, #5 - Oct 2002 + Professionalism in Programming, from CVu journal

Browse in :

All > Journals > CVu > 145 (10)
All > Journal Columns > Professionalism (40)
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: Professionalism in Programming #16

Author: Administrator

Date: 06 October 2002 13:15:55 +01:00 or Sun, 06 October 2002 13:15:55 +01:00

Summary:

What's in a name?

Body:

"When I use a word," Humpty Dumpty said, in a rather scornful tone, "it means just what I choose it to mean - neither more nor less." [Lewis Carroll]

Ancient civilisations knew that to name something was to have power over it. This was more than simply a claiming of possession. Some believed so strongly that they would never give out their own name to a stranger for fear they would be able to inflict harm using it.

Names mean an awful lot. It is fundamental to our concept of identity. We see examples throughout history. Even before 2000 BC we're shown Biblical examples of meaningful place names and children being given names to reflect circumstances. It's still convention for women to change surname when they marry, although the fact that some choose to do otherwise shows how they attribute significant meaning to their name. Why would people want to have their name changed by deed-poll if it meant nothing to them?

A name not only promotes identity, it also implies behaviour. Obviously a name doesn't entirely dictate what an object does. But it goes a long way towards defining how you interact with that thing, and how the outside world interprets it. This is borne out even more clearly by the fact we're never fixed to one name per `object'. I'm known by different monikers in different contexts: the name my wife calls me^[1], the name my daughter knows me by, the nick name I'm known as in chat-rooms, and so on. These names describe a relationship and interaction with me, and a role I fulfil.

A name marks something out as a distinct entity. It elevates it from an ethereal concept to a well-defined reality. Before someone put a name to electricity no one would have understood what it was, although they'd have some vague idea of its effects by watching lightning or Benjamin Franklin's demonstrations. Once named, it became identifiable as a distinct force and consequently easier to reason about. The Basque culture believes that naming something proves its existence. Izena duen guzia omen da. That which has a name exists [Kurlansky].

Today the act of naming has become a multi-million pound business, used (with varying degrees of success) by small firms through to the largest multinational corporations to launch, re-brand and publicise products. The newer, ever more catchy names are intended to build awareness of their products and services.

So names clearly are of immense import.

As programmers, we wield this enormous power over the constructs we create when we name them. A badly named entity can be more than just inconvenient; it can be plain misleading and even downright dangerous. Consider as a very simplistic example the following C++ code:

void checkForContinue(bool shallWeContinue) {
    if (shallWeContinue) exit(0);
}

The parameter name is clearly a lie, or at least its sense is the other way round to what you'd expect. The function will not perform as anticipated and your program will halt as a consequence - a reasonably dire result from a single misnamed variable.

Why should we name well?

Clearly we need to consider the names we give things carefully. The name creates a channel of understanding, control and mastery. Appropriate naming means that `to know a name is to know the object'. And the opposite?

As in the real world, names can be both useful and limiting at the same time. People tend to stick to their initial perceptions of a concept, despite the proverb about judging books by covers. Therefore it's important to convey the right first impression through careful naming.

Apparently the human brain can only hold seven pieces of information concurrently (although I'm pretty sure I've got a couple of defective slots in my head reducing the overall capacity). It's hard enough to hold all the information about a program in your head as it is; we should not add complex naming schemes or require obscure references to make this task even harder.

What do we name?

Let's spend some time thinking, as programmers, about what we name and how we name it. First: what? A minimum set of things we name, directly related to writing code, are:

variables,
functions,
types (classes, enums, structs, typedefs),
macros, and
source files.

This list is by no means an exhaustive one - there are other higher level entities we'll put meaningful names to: states of a state machine, parts of messaging protocols, database elements, application executables, and so on. These five are enough to be starting with.

What shall I call you?

So: how do we name? The naming convention for each of these classes of item should depend on the coding standard we're working to, if one exists. However, whilst a standard might mandate certain naming conventions it is not really sufficient to guide appropriate naming for each and every variable.

Generally there are very few rules from a compiler as to how we can name things. Modern languages have case sensitive names, don't allow `white space' (spaces, tabs, newlines) and allow just alphanumeric and a few particular symbols (commonly at least an underscore). These days there are no appreciable limits on identifier length^[2]. Without jumping through a considerable number of hoops we're usually limited to the ISO8859-1 (ASCII) character set, so non-English speakers are at a disadvantage. The C/C++ standards also reserve other ranges of names, for example any global identifier that begins with str or an underscore, and anything in a namespace called std. As practitioners it's important to be aware of these kinds of restriction so we can write robust and correct code.

Avoid jokey names like blah or wibble. They can easily creep in, and whilst amusing at first, just create confusion later on. Things like this are usually quick temporary hacks that outlive their expected uses. Name all things well first time, all the time. Obviously, being professional means that you don't explete when naming.

For each of the above listed sets of items, the following sections present some considerations for good naming.

Naming variables

If a variable didn't consist of electricity it would be the sort of thing you could hold in your hand. It is very much the programming equivalent of a physical object, and a name that reflects this will usually be a noun. For example, some variable names in a GUI application might be ok_button and main_window.

If not a noun, it will usually be a `nounised' verb, e.g. count. Numeric variables' names describe the interpretation of the value, e.g. num_apples. As we saw earlier, a boolean variable name might be a the name of a conditional statement, which is natural considering the value will either be true or false.

Since your variables are the fundamental data you work with you must give clear, very descriptive names. It doesn't matter if these are long if it's required to make their meanings unambiguous. `a' is not a realistic replacement for `num_apples'

However, there may be a case for short (even one letter) variable names: as loop counters. They actually make reasonable sense in small loops where variable names like `loop_counter' are not just obvious but can become rapidly tedious.

When working with OO languages there are a number of conventions you may adopt to `adorn' member variables to show they are members and not an ordinary local variable or (evil) global variable. This is a mild form of Hungarian Notation (see later section). Whilst not strictly necessary, some programmers find it a useful practice. In C++ some common forms are to prefix member variable names with an underscore, suffix them with an underscore or prefix with `m_'. The former method is frowned upon because it sails close to the wind; remember you can't have global identifiers beginning with underscore. Besides, a leading or trailing underscore makes the variable pretty unnatural to read.

Of course, this kind of member naming convention won't have any impact on a class' public API because all your member variables are private anyway (aren't they?).

The French language has two forms of the word `you': tu and vous, depending on how familiar you are with a person. The name we know a variable by may depend on the context we need it in. For example, you may see a variable named differently in a function's public declaration (in a .h file) and in the implementation (in a .c file).

Some people feel it necessary to adorn pointer types with something like a `_ptr' suffix, and similarly for reference types `_ref'. This is another subtle incursion of Hungarian notation, and is redundant. The fact the variable is a pointer is implicit in it's type. If your function is so large that you think this adornment is a useful aid-memoire, then your function is probably too long!

Another commonly seen variable naming practice is using acronyms as a concise `meaningful' name. For example you might declare a variable like this: SomeTypeWithMeaningfulNaming stwmn(10);

No matter what your method of variable naming, it is helpful to prefer a convention that distinguishes type names from variable names. Commonly type names have an upper case initial letter, and variables a lower case one. When using this convention it's not uncommon to see variable declared like this: Window window;

Naming functions

If you hold a variable in your hand, the function is what you do with it - you don't just want to hold it forever. Since a function is clearly an action, its name will logically be, or will include, a verb to indicate this. A function name that was just a noun would not be clear: for example, what does `apples()' do? Does it return a number of apples, does it convert something into apples, or does it make apples out of thin air?

Meaningful function names will avoid including the words be, do or perform. These are a classic trap for students when first trying to consciously include verbs in their function names (this function does XXX...). That kind of word is just noise and don't add any value to the function name.

A function should always be named from the viewpoint of the user - hiding all the internal implementation stuff neatly away (that's the point of a function, it's a level of compression/abstraction). Who cares if behind the scenes it stores an element in a map, makes calls over a network, builds a new computer and installs a word processor on it, or whatever. If the user only sees the function count apples, the function should be called countApples().

When we write functions they should be well documented (either in a specification or using some literate programming method). However, this is no excuse for not making the function name a clear statement of what the function does. Its name is part of its `contract'. For example, what does void a() do? It could be anything.

The detail that must be included in the name will depend on the context it is defined in. For example, if a function that returns the number of apples in a tree is defined in a C++ class Tree then it needn't be called numApplesInTree(). It's full name would be an unambiguous description: Tree::numApples(). This context information works similarly for namespaces^[3].

One final set of functions deserve consideration: "getters" and "setters". We see that some classes naturally act as collections of variables that behave like `properties'. Each property needs a member function to read its value, and one to set it; some languages have built-in support for this kind of property. Whilst some argue that the existence of such get/set methods shows a weak design, nonetheless we see a lot of classes written containing this kind of API. There are a number of conventions related to naming these member functions: they include (for some property called foo of type Foo):

Foo &getFoo();
void setFoo(Foo &foo) const;

and:

Foo &foo();
void setFoo(Foo &foo) const;

or perhaps,

Foo &foo();
void foo(Foo &foo) const;

Your choice may vary, or again be dictated by your coding standard. This is one where, personally, I would violate the `name always contains a verb rule' and go for the second option, since it reads the most naturally in code.

Naming types

The sort of types we may create depends on the language we're using. In C we can only define typedefs, which are synonyms for other type names. You use them to provide an easier, more convenient name for existing type. It stands to reason, then, that a typedef should be clearly named. Even if it's only a local typedef in a function body it should still have a descriptive name.

Java, C++, and other OO languages are profoundly based on the creation of new types (classes). In the same way correct names for variables and (member) functions is vital to the readability of the code, good type names are paramount.

There aren't many obvious naming heuristics for classes, though. A class may be describing some state-full data object. In that case its name will probably be a noun. It may be a function object or class implementing some virtual callback interface. Here the name will probably be a verb, perhaps including the name of some recognised design pattern [Gamma]. If the class is a bit of a mash of both, it's probably hard to name and potentially badly designed.

We saw a few words to avoid in function names, there are similar cases here. When putting a name to a class you should almost always avoid including the word class or object. In type names these are usually redundant noise. For example, DataObject is a bad name: the class may very well contain data, but it's obviously going to be used to create an object, that doesn't need restating in the type's name. The class name should describe the class of data and not the actual object. That's a subtle distinction, but important.

A bad class name can serve to really confuse programmers. As an illustration I've worked on an application which contained a state machine implementation. For some historical reason the base class of each state was called Window. It was very odd to work out what exactly was going on (and this wasn't helped by a distinct lack of documentation to boot). To add insult to injury the base class of a command pattern was called Strategy when it actually wasn't implementing a strategy pattern. Suffice to say it took me a little while to get my head around what was going on. Better naming would have allowed me easier access to the code's logic.

Capitalisation Conventions

Naming conventions are a source of about as many programmer fist- fights as the Eternal Holy Editor Wars (no one seems to have noticed that vim won years ago :- (Clearly a typo: you meant emacs, of course. - ed) Most languages prohibit us from using white space and punctuation in our identifiers, so we adopt a convention for splitting up multiple words. There are a number of common ways of doing so which you'll see in modern code.

camelCase

As seen used extensively by the Java language libraries, also in many C++ codebases: KDE for example. It is so called because the capitalisation resembles a camel's humps, and was probably first used in Smalltalk in the early 1970s.

ProperCase

This is a close relative of camelCase, its only difference is that the first letter is also capitalised. Often the two conventions are used together. For example, in Java, class names are written in proper case, and variables and methods in camel case.

using_underscores

Proponents of this style are the implementers of the C++ standard library (look at all the names in the std namespace) and the GNU foundation.

There are, of course, more forms. How many can you come up with from the top of your head? You can start by mixing proper case with underscores. There are other similar naming considerations, like:

How many vowels do you drop to make an identifier easy to type? Too many and it becomes unreadable.
Do you require that any verb must come first in multi-word function names?
Do you adorn member variable names, and if so, how?

Naming macros

Macros are the walnut-cracking sledgehammers of the C/C++ world. They are a basic text search/replace tool that don't respect scope or visibility. They're tactless. However, there are some walnuts that just won't crack without them.

Since they have such drastic effects there is a well-established tradition for naming macros in a maximally obvious way, using CAPITAL LETTERS. Follow this without fail. And don't make any other name entirely capitalised. This makes macros stand out like a sore thumb, which is basically what they are.

Naming files

Did you think of files when we talked about naming things? The name of your source files can have a real effect on the ease of coding. Obviously what you call a source file, be it a header or implementation file depends on what goes in it. In C and C++ there aren't actually any restrictions on file names, but calling headers "something.h" is such a universal convention that it would be like sticking pins in your eyes not to. We already feel some pain from the lack of rigid definition though. Different people call C++ implementation files different things, .C, .cc, .cpp, .cxx, and .c++ are common file suffixes. Your choice will usually depend your compiler, personal preferences, and/or coding standard. I have even worked on platforms that didn't support file extensions and defined file types by the name of the enclosing directory (with appropriate massaging for standard header file includes). That was reasonably evil!

Moving past the discussion of what suffix to give your files, exactly how should you name them? To make this naming easy and obvious a file should usually contain one conceptual unit. Any more stuff in that one file is asking for trouble in the long run. Split your code into the maximum number of files you can, not only will it make them easier to name, it should reduce coupling since you don't #include one big monolithic header file who's smallest change in one dusty corner requires many dependant recompilations. If you have a file defining the interface for a widget it should be called "widget.h" (not "widget_interface.h", "widget_decls.h", or any other variation).

Once you have a file that can be appropriately named, you conventionally should balance each foo.h with a matching foo.cpp that implements whatever the foo.h declares. This is both obvious and conventional.

Now, there are other insidious issues when naming files. You need to sort out the capitalisation. Some filing systems (naming no names^[4]) can't get this right, ignoring case when looking up file names. When porting code to platforms where case is important your code won't compile unless you've observed capitalisation carefully. Perhaps the easiest method of avoiding this sort of issue is to mandate all lowercase filenames. If you don't, be careful. For the same reason, if your filing system considers that "foo.h" and "Foo.h" are different files, don't exploit it. Make sure that your filenames differ by more than just case. If you mix languages in a single project don't create foo.c and foo.cpp - it's messy; which file is used to create foo.o?

Older filing systems limited the number of characters you could use in a filename, which made naming much messier. Unless you have to port code across to such an archaic system this kind of limitation can be safely ignored.

Try to ensure that each header file you create has a distinct name, even if they're all spread across different directories. This makes it easier to reason about which header file you are actually including when you #include "foo.h". If there were two different files with the same name a newcomer to the codebase would be confused. This gets to be more of an issue the larger the codebase gets. One valid way to work with this is to include some path information in the logical filename, i.e. you may include "library_one/version.h" and "library_two/version.h" without too much panic.

As an illustration of how file naming impacts ease of coding, I worked a particular project where the majority of the filenames matched the class names exactly, for example the class Daffodil was defined in daffodil.h (names have been changed to protect the guilty). To make things more interesting, every now and again a file was named in a slightly different manner, usually slightly abbreviated, so ProxyObject would be held in proxyobj.h. That just made finding the right filename to include more complicated and time consuming than it needed to be. On top of this, not all of the Daffodil class implementation was necessarily in Daffodil.cpp - some of it might have been in a shared FlowerStuff.cpp and perhaps also in Yoghurt.cpp for no adequately explained reason. As you can imagine, this made finding particular bits of code a nightmare.

A rose by any other name

That's a pretty large set of considerations for naming bits of code. What are the overall principles to pull out? Perhaps the most important thing is that you should ensure consistency in all of your naming, and not just within your own work, but also respecting company-wide principles. This goes right down to the typography of a name, and its capitalisation. For example I have no confidence in the quality of a class interface if it looks like this:

class foo : public Bar {
  public:
    doTheFirstThing();
    DoTheSecondThing();
    do_the_third_thing();
};

When you get a lot of people working `together' on the same lump of code its very easy to end up in this state, being about as internally consistent as a random number generator. It's often a symptom of worse problems - the programmers probably aren't respecting the fundamental design of the code they're simultaneously working on. This is where mandated coding standards and central design documents are a big advantage.

With consistent naming we get code that is intuitive, therefore easier to work with, easier to extend and maintain. In the long run it's much cheaper to manage. Whilst the C++ standard library is a definitive source of programming best practice it also contains some classic examples of inconsistent and inappropriate naming. This shows that no matter how good your codebase you'll probably have to live with some bad naming.

There is power in a name, and power that allows us to be more expressive than a language's syntax alone might allow. Think about how you can use similar names to group things together, or how you can imply which of a function's parameters are input or output.

Hungarian notation

Bunfight! There is nothing like the mention of Hungarian Notation in the programmer's realm of naming that will raise hackles and cause such a heated discussion. A few readers won't know what this practice is. Since it's such a controversial issue, in describing it I'll be careful not to make any judgement calls; it's not really the place of such an article.

Hungarian notation is the downright evil, obstructive and complex practice of encoding information about a variable or a function's type in its name in the misguided belief that it will make the code more readable and more maintainable. It sprang from Microsoft in the 80s and it's particularly interesting to note that these days large parts of Microsoft itself ignores this abominable convention. It's widely used in their public Win32 APIs and the MFC library, which is almost certainly the main reason for its popularity.

It's so called because it was pioneered by a Hungarian programmer Charles Simony. It's also called that because variable names written using it look like they may as well have been written in Hungarian: non-Windows programmers will be confused by surreal names like lpszFile, rdParam and hwndItem dotted around every piece of code.

There are many subtly different and not-quite compatible dialects of Hungarian Notation which doesn't help matters. Also in some situations, the same prefix can mean different things. These are some common Hungarian encoding prefixes, not including any magic Microsoft typedef codes:

p	pointer to ... (lp means `long' pointer, an old architectural issue - if you don't know, don't ask)
r	reference of ...
k	constant ...
rg	array of ...
b	boolean (`bool` or some C typedef)
c	`char`
si	`short int`
i	`int`
li	`long int`
d	`Double`
ld	`long double`
sz	zero terminated char string (note: not p)
s	`struct`
C	`class` (you can defined your own class abbreviations too)

Table 1.

Hungarian notation was relatively unbearable in C (not to mention unnecessary once the language became more strongly typed), and can become rapidly nauseating in C++ since it doesn't really scale up to the many new type definitions you can introduce. If you really want to confuse a maintenance programmer use Hungarian notation and then go around a few months later changing the types of all the variables without search- and-replacing every single variable name (naturally, it will take too long to do that). Aside from being a joke, this is not an uncommon problem with this naming scheme.

Unless you are forced to use it, Hungarian notation is best left well alone. Naturally, thousands of readers will now write in and argue against such a neutral and diplomatic viewpoint (please do!).

Conclusion

Our ancient ancestors knew it and good programmers know it. It's crucial to name things well. Good names serve more than just an aesthetic purpose, they convey information about the structure of code. They are an essential tool to aid comprehensibility and maintainability. Bad names have the potential to mislead. There is power in a name and an experienced professional programmer understands the balance of concerns involved when naming any part of their code.

This all comes back to the main reason we write code in high level- languages: to communicate. Our communication is to an audience of code- readers, that is other programmers, rather than to the compiler.

I GOT PEELED OFF^[5]

References

[Carroll] Lewis Carroll (1832-1898). Through the Looking Glass.

[Kurlansky] Mark Kurlansky. The Basque History of the World. Jonathan Cope. ISBN: 0-224-06055-4.

[Gamma] Gamma, et al. Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley. ISBN: 0-201-63361-2.

^[1] This will alter depending on whether she's in a good or bad mood with me at the time!

^[2] Be aware that older versions of C limited external unique linkage to the first six characters, and case was not necessarily significant. You need to understand exactly what the target of your code is when you write it.

^[3] In fact, this is a reasonably universal principle that could apply to most named items. For example enumeration element names found at class scope would be different to a similar definition at global scope.

^[4] Painfully obvious pun.

^[5] Inappropriate naming courtesy of the Internet Anagram Server.

Notes:

More fields may be available via dynamicdata ..