Programming Topics + CVu Journal Vol 12, #4 - Jul 2000

Browse in :

All > Topics > Programming
All > Journals > CVu > 124
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: Reading C & C++ Variable Declarations

Author: Administrator

Date: 03 July 2000 13:15:38 +01:00 or Mon, 03 July 2000 13:15:38 +01:00

Summary:

Body:

I was never formally taught the C language. In college in 1979 the main teaching language was Pascal. I learnt C by looking over the shoulders of more knowledgeable colleagues in my first job. By studying their program listings it was fairly obvious that int i; declared that i was an integer, that char *cp; declared a pointer to a character and that char buf[100]; declared an array of 100 characters. I soon came to understand that char *argv[] was an array of pointers to characters. But that was about as far as I got. I found more complex declarations puzzling. I was reminded of this recently while reading an interview with Bjarne Stroustrup [Stroustrup2000]:

[Interviewer:] In another interview, you defined the C declarator syntax as an experiment that failed. However, this syntactic construct has been around for 27 years and perhaps more; why do you consider it problematic (except for its cumbersome syntax)?

[Stroustrup's reply:] I don't consider it problematic except for its cumbersome syntax. It is good and necessary to be able to express ideas such as "p is a pointer to an array of 10 elements that are pointers to functions taking two integer arguments and returning a bool." However,
bool (*(*p)[10])(int,int); 
is not an obvious way of saying that. In real life, I'd have to use a typedef to get it right:
typedef bool (*Comparison)(int,int); 
Comparison (*p)[10]; 
…

I find the fact that Stroustrup finds declarations difficult to get right somewhat reassuring. I, too, use his technique of breaking complex declarations down into more manageable stages using typedefs. I think his example is a good one: I can grasp the declaration of p using typdefs much quicker than I can understand the one line declaration. And I could not have understood the latter at all before I learnt the "Right-Left" rule, more on which in a moment.

Back to Basics

"Declarations specify the interpretation given to each identifier…"

If you want to really understand C declarations I suggest the best place look is section A8 of The C Programming Language [KandR1988], from which the above quote is taken. There you will find a dozen pages describing the syntax and meaning of declarations. For a concise summary I would turn to section 4.9.1 of The C++ Programming Language [Stroustrup1997], from which the following extract is taken:

A declaration consists of four parts: an optional "specifier," a base type, a declarator, and an optional initializer. Except for function and namespace definitions, a declaration is terminated by a semicolon. For example:
  char* kings[] = 
      {"Antigonus", "Seleucus", "Ptolemy"};

Here, the base type is char, the declarator is *kings[], and the initializer is ={…}.

A specifier is an initial keyword, such as virtual and extern, that specifies some non-type attribute of what is being declared.

A declarator is composed of a name and optionally some declarator operators. The most common declarator operators are:

*	pointer	prefix
*const	constant pointer	prefix
&	reference	prefix
[]	array	postfix
()	function	postfix

Their use would be simple if they were all either prefix or postfix. However, *, [], and () were designed to mirror their use in expressions. Thus, * is prefix and [] and () are postfix. The postfix declarator operators bind tighter than the prefix ones. Consequently, *kings[] is a vector of pointers to something, and we have to use parentheses to express types such as "pointer to function."

So to turn Stroustrup's kings declarator example from a vector (array) of pointers to something into a pointer to a vector of something we would use parentheses like so: (*kings)[]. It is this mixture of pre- and postfix operators where the Right-Left rule can help.

The Right-Left Rule

The rule is described in various places on the web, which is where I first came across it. But I do not know who first coined the term. The Right-Left rule for reading declarations is:

Start with the identifier. Say, "identifier is."
Go right, interpreting the operators you find according to the table below. If you encounter a right parenthesis, or there are no more operators, go left from the identifier.
When going left interpret the operators according to the table below. If you encounter the base type just say it. If you encounter a left parenthesis, or there are no more operators, go right from where you stopped going right last time.

Repeat rules 2 and 3 until there are no more operators to interpret.

*	"pointer to"
&	"reference to"
[]	"array of"
[n]	"array of n"
()	"function returning"
(arg)	"function taking arg and returning"

Using this rule to work through the example Stroustrup quotes in the above interview:

bool (*(*p)[10])(int, int);

Rule 1, locate the identifier: "p is"

bool (*(*p)[10])(int, int);

Rule 2, go right, encounter right parentheses, go left.

bool (*(*p)[10])(int, int);

Rule 3, go left, encounter *, look up in table: "p is pointer to"

bool (*(*p)[10])(int, int);

Rule 3, go left, encounter left parentheses, go right.

bool (*(*p)[10])(int, int);

Rule 2, go right, encounter [10], look up in table: "p is pointer to array of 10"

bool (*(*p)[10])(int, int);

Rule 2, go right, encounter right parentheses, go left.

bool (*(*p)[10])(int, int);

Rule 3, go left, encounter *, look up in table: "p is pointer to array of 10 pointers to"

bool (*(*p)[10])(int, int);

Rule 3, go left, encounter left parentheses, go right.

bool (*(*p)[10])(int, int);

Rule 2, go right, encounter (int, int), look up in table: "p is pointer to array of 10 pointers to function taking args "int, int" and returning"

bool (*(*p)[10])(int, int);

Rule 2, go right, no more operators, go left.

bool (*(*p)[10])(int,int);

Rule 3, go left, and encounter base type: "p is pointer to array of 10 pointers to function taking args "int, int" and returning bool"

As you can see, this simple mechanical process has produced a fairly clear description of the given declaration. It is worth noting that just because you can use the Right-Left rule to turn a declaration into English, it isn't necessarily legal C. For example, the Right-Left rule will merrily read

int (*fn)()[7];

as "fn is a pointer to a function returning array of 7 ints," which, as you know, is not permitted in C.

Not everyone will find this mechanical approach to reading declarations necessary or useful. But personally I do. Some years ago I saw a program that, given a C declaration, would spit out the equivalent English description. I thought this was quite magic at the time, but now I can see it would not be too hard to encode the Right-Left rule.

References

[KandR1988] Brian Kernighan & Dennis Ritchie The C Programming Language Second Edition, Prentice Hall, 1988.

[Stroustrup1997] Bjarne Stroustrup The C++ Programming Language Third Edition, Addison Wesley, 1997.

[Stroustrup2000] Interview in Visual C Developers Journal and reproduced at: http://www.devx.com/upload/free/features/vcdj/2000/05may00/ens0500/ens0500-1.asp

Notes:

More fields may be available via dynamicdata ..