Journal Articles

CVu Journal Vol 8, #1 - Feb 1996 + Programming Topics

Browse in :

All > Journals > CVu > 081 (7)
All > Topics > Programming (877)
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: Software Engineers Toolbox

Author: Administrator

Date: 03 February 1996 13:15:26 +00:00 or Sat, 03 February 1996 13:15:26 +00:00

Summary:

Body:

What's In a Name?

When I wrote my last column (Test Yourself), I did my best to ensure that the answers were totally accurate. If only I had been a bit more careful with the questions! As Francis correctly pointed out in his sidebars, I was a bit careless with my prototypes and returns. The trouble is, I was concentrating so carefully on the bits that were intended to be wrong, I took insufficient care with the rest. Fortunately, none of this invalidated the answers, but it does show that you can't be too careful. If it had been 'proper' code, of course, either Lint or the compiler would have picked me up on all of these problems. I think I shall get into the habit of linting even such 'trivial' examples before publishing them.

There are a couple of areas of C programming which tend to be overlooked in the text books. Perhaps they just aren't glamorous enough. One such subject is the way we name things in our code. I don't just mean the general rule for what is a valid identifier, I mean guidelines for how to construct meaningful, consistent and helpful identifiers - a naming convention.

Some people don't worry too much about naming conventions. They have certain, vague preferences for how they choose and format their identifiers, but they make no effort to create a formal, or at least conscious, set of rules. I don't want to overstate the importance of having a good formal naming convention, but I do believe that it is important to have one. Good identifier names do help programmers to write better programs, but the real payoff comes in the maintenance phase.

If you do any amount of programming, you will inevitably get involved in code maintenance, which for most programs is 60%-90% of the total programming effort. The biggest problem with code maintenance is that you often have to spend considerable time and effort trying to work out how the code works before you can start to modify it. Even code you wrote yourself can be impenetrable after a twelve month hiatus, so it is well worth spending a bit of effort to avoid problems such as poor formatting and misleading identifiers. (Poor formatting can at least be fixed fairly easily with tools such as indent, but bad or meaningless identifiers are a curse forever, as those who have to maintain Unix kernels will probably affirm.) It is well worth the effort to define and use a sensible and consistent naming convention.

What constitutes a valid identifier in C is defined by the standard as any combination of upper and lower-case letters, digits and the underscore, provided the first character is a non-digit. Of course, defining what is a valid name is easy, defining what is a meaningful name is somewhat more difficult. The standards obviously allows any combination of characters, such as _X_35Wxc_7, whether it is meaningful or not. We put meaning into names by using combinations of words with accepted and unambiguous meanings.

The minimum length of a name (obviously) is one, but the maximum length is implementation dependant. ISO C guarantees at least thirty-one significant characters for macro names or identifiers with internal linkage, but only six (case insignificant) characters for identifiers with external linkage. This latter restriction was provided for compatibility with some old, but important (at the time) linkers. In practice, the six character restriction is obsolete and should be ignored unless you know there really is a problem. (It is likely to be dropped from the next revision of the standard.) Many implementations allow names much longer than thirty-one characters, but still only treat the first thirty-one as significant. Some allow more significant characters, but frankly, I think thirty-one characters is already too long. I start to worry if I see too many identifiers longer than twenty characters or so. Expressions with long identifiers rapidly disappear off the right side of the screen or have to be split across multiple lines, both of which tend to make code less readable. Of course, if you think an identifier genuinely needs more characters, by all means use them, but be conservative.

All identifiers can have a punctuation style That is, the way it appears as opposed to what it says. A punctuation style is defined by its use of underscores and capitals. There are a huge number of possibilities, but 95% of C code is are probably covered by the following seven styles.

HAT_SIZE
HATSIZE
hat_size
Hat_Size
HatSize
hatSize
hatsize

The main purpose of punctuation should be to make the name easier to read and for that reason, I don't like styles 2 and 7. totalannualinterest is far too difficult to decipher. That still leaves us, however, with five good alternatives. You could choose just one style and use that for all names, but as we have several possibilities, we can use them to help differentiate between the various uses of identifiers (variable, function, tags, typedefs, etc.). When reading code, it is often useful to know the sort of language element a name represents. i.e. Is this a function or a macro? By using different styles consistently, we can provide some of this information. (We will see other ways shortly.) There are very few de facto standards for C naming conventions, but one which is almost universal is to use all uppercase for macro names. I would recommend that you keep to that and use style one or two for macros. The only other advice I am going to give is, don't go overboard. It isn't necessary, or desirable, to have different styles for everything. The most important point is to decide what styles you will use for each type of identifier and to use those styles consistently.

Some time ago, I did a quick survey (on the accu.general mailing list - thanks folks) of the styles that people prefer (I forgot to include style 7) for five common classes of identifier. The results are listed in Table 1 (rating out of 5):

Table 1. Table 1

	Vars	Funcs	Object Macros	Function Macros	typedefs
`HATSIZE`	1.0	1.1	2.8	2.6	1.6
`HAT_SIZE`	1.1	1.1	4.2	3.5	2.2
`hat_size`	4.4	4.4	1.5	2.2	2.9
`Hat_Size`	2.0	2.2	1.1	1.6	2.0
`HatSize`	2.6	2.7	1.5	1.7	3.4
`hatSize`	3.7	3.7	2.6	2.7	2.3

This is probably a pretty good reflection of the general use of these styles in C programming. Some people derided the HatSize style as 'Pascal influence'. Personally, I quite like it, partially because it is more compact than using underscore separators, but there isn't a lot in it.

Now comes the task of putting meaning into the names by carefully choosing words to put into them. C identi-fiers range from single letters through single words to longer compounds of two, three, or more words. Single letter names can be acceptable in some cases, but not as often as many programmers seem to think. X and Y may be appropriate for positions a Cartesian co-ordinate system, but not as temporary variables. It is widespread practise to use i, j, k, etc. as simple loop counters. Unfortunately, many programmers use these when more descriptive names would be helpful, such as when they are also being used as array indices. My general advice is to use single letter identifiers very sparingly.

There is always a balance between making an identifier as meaningful as possible and keeping to a reasonable length. To get maximum meaning in minimum length every part of the name must earn its keep, so the first rule is, no weasel words! Weasel-words are those whose meaning is so vague or general that they add nothing to the meaning of the identifier The common offender is Process... for function names. Every function is doing some sort of processing. We what to know exactly what that process is, so call it CalculateVolume(), not ProcessMeasurements(). Other weasel-words are variable names like num or flag. Very few variables are so bereft of meaning that you couldn't give them a more meaningful name than these.

The other way to keep the length manageable is to use abbreviations. Care needs to be taken, however, if meaning is not to be lost. Abbreviations must be consistent. There should be a list of acceptable, common abbreviation and some control over project-specific ones. If there is a standard abbreviation, then use it unless there is a very good reason not to. Don't use different abbreviations for the same word, or use abbreviations sporadically. For example, if your standard abbreviation for 'number' is 'num, don't mix variable names such as NumHats, NmBlocks and NumberOfShirts. Be very careful that your abbreviations aren't ambiguous, either. Does wt stand for weight, white or something else? Defining standard abbreviations helps to resolve such problems.

Where there is no commonly agreed abbreviation for a word, it is often possible to create one by dropping some or all of the vowels, or by choosing those letters which contribute most to the pronunciation of the word. (i.e. drop any 'silent'- or almost silent - letters.) As with all abbreviations, care is needed to avoid ambiguity. If you are including measurement units in a name, use the scientifically accepted abbreviations, including the correct case. (e.g. power_kW, height_mm)

Spellings should be consistent across all identifiers. Avoid having EyeColor, but HairColour. Also avoid names which only differ by a single character. It is far too easy for a typing error to introduce a subtle bug otherwise. Nor should you use names which have identical spelling and rely on case-sensitivity to separate them. Don't have a variable called max_temp and a macro called MAX_TEMP. Beware, too, of spellings which may confuse 0 (zero) with O (uppercase-O), 1 (one) with l (lowercase-L) or 2 (two) with Z (uppercase-Z). (The similarity is much worse in certain fonts.) One example from my experience is a variable for serial line zero, called sl0 (es-el-zero), which many people thought was called 'es-ten'.

Also, don't forget to consider the way it will sound. Although code is a written medium, we often have to discuss it in meetings such as code reviews. Try not to create names which look different, but sound similar.

As well as deciding which words to put in the name, you should also think about the order in which you place them. Some writers say it is better to put the most important word first. Although I would generally agree with this, I balance this against the need for names to be comfortable to read. Putting the most important word first can sometimes result in an awkward-sounding name. As with so many other factors, the most important thing is to be consistent. If you decide to write Max... rather than ...Max, then use that style consistently. Don't use MaxTemp in one place and HeightMax in another.

Although I said above that every part of a name should earn its keep, I do sometimes allow a little redundancy to creep in. I do this mainly to make searching for names easier. Let me give you an example. Code I worked on recently had a function called Intercom(). There was also a large family of related functions called IntercomHold(), IntercomCancel(), etc. The problem was that a search for Intercom matched dozens of instances that I wasn't really looking for. I could get around this to a certain extent by clever grep patterns, but not entirely. (Maintenance programmers tend to grep a lot!) If the function had been called IntercomRqst(), say, it would have made life a bit easier. Since it is usually shorter names that suffer from this problem, adding a few redundant characters isn't too big a big problem.

One aspect of naming that I haven't talked about yet is the use of affixes. Affixes can be used to provide additional information about the purpose or nature of an identifier in addition to the pure meaning. (If it adds to the meaning, I consider it to be part of the base name, not an affix.) We have already looked at doing this using different punctuation styles. Affixes can be used as an alternative or a supplement to that method. The three primary uses for affixes are to include type information, to associate an identifier with a particular functional unit and to indicate what language element an identifier represents.

The most common example of including type information is the notorious Hungarian notation, which is loved and hated in roughly equal measure. (I am only mentioning the possibilities, not discussing the merits!) Hungarian notation prefixes names with letters which indicate the type of a variable, or the return type of a function. Thus an integer with a base name of WallHeight becomes iWallHeight. A similar thing may be done with suffixes, though this is traditionally much more limited. The canonical example is using a suffix _ptr to indicate that the name refers to a pointer. e.g. output_file_ptr is a pointer to the output file.

Affixes are also used to indicate logical groupings. Well designed code usually has some sort of modular structure. For example, all the database access functions will be in a single module. By prefixing all database functions with, say, db, we can see that dbGetName(), dbGetNumber() and dbPutDetails() are all related functions. This is traditionally done with prefixes rather than suffixes.

The third common use of affixes is to indicate what sort of language element the identifier is. The most common application is probably using a suffix of _t to indicate user defined types. This is the method is used in the standard for types such as size_t and wchar_t. This is not a bad practice to follow, except that Posix reserves this practice. Not a problem if you aren't programming for Posix, of course. If you are, or you want to be maximally portable, there is nothing to stop you defining your own suffix, such as _ty. Other uses in this category include suffixes to identify struct or union tags.

The last point I will mention is reserved identifiers. The C standard (and Draft C++ Standard) reserve quite a lot of possible names. I am not going to give a full list here, but if you want a rough and ready rule, don't use any identifier appearing in a standard library header and never start an identifier with an underscore. (For C, it is best to steer clear of C++ keywords, too.) That still leaves a few holes, but it is a jolly sight better than nothing. If you are really keen, you might also avoid identifiers reserved by related standards such as Posix.

This article is necessarily only a brief summary of the issues involved. If you would like more guidance on how to set up a good naming convention, a couple of good books are "C Elements of Style", by Steve Oualline and "C-Style: Standards and Guidelines" by David Straker. The most important thing is not so much what you put in your naming convention, but that you have one and use it consistently.

I don't pretend that using a good naming convention will eliminate all your bugs or make your coffee taste better, but it can make your job considerably more comfortable.

Notes:

More fields may be available via dynamicdata ..