Journal Articles

CVu Journal Vol 11, #5 - Aug 1999 + Programming Topics
Browse in : All > Journals > CVu > 115 (21)
All > Topics > Programming (877)
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: Syntax v Semantics Part 1

Author: Administrator

Date: 03 August 1999 13:15:33 +01:00 or Tue, 03 August 1999 13:15:33 +01:00

Summary: 

Body: 

There are a variety of skills and understandings that make up the qualifications for being a good programmer. Clearly programmers must have a sound knowledge of the syntax of the language they are using. This is relatively easy to learn, and even though experienced programmers sometimes make mistakes when using the more obscure parts of a large language it is easy to detect errors and learn from them. Only the worst authors get issues of syntax wrong.

The only serious problem is when a programmer's code is syntactically correct but not what was intended. This leads to the issue of semantics.

In simple terms the semantics of an expression refer to its meaning. That is what it does or the resulting behaviour from executing a piece of code. Many if not most authors get the semantics wrong at least some of the time. What makes it particularly problematical is that code can behave as you expect even if there are options that mean that it can behave otherwise. Confused? Well look at the following simple statements:

int i = 0;
i += i++;
i += ++i;

These statements are syntactically correct in several languages (C, C++ and Java to name but three). However we need to understand their semantics before we use them otherwise we will sometimes be seriously surprised.

int i = 0;

I think that the semantics of the first statement, the definition of i as a variable of type int initialised to zero allows for no semantic options, however you should note that the use of an '=' does not make this any kind of assignment. While one of the syntactic options (the only one in C) for initialising a variable at the point of creation looks like an assignment the semantics (behaviour) is subtly different. For example we can initialise variables in ways that we cannot assign to them:

int array[] = {0,1,2};

Most languages with C-style syntax do not allow assignment to arrays or between arrays even though we can use brace initialisation when we define an array. (By the way, I would be interested to hear of languages where this is not true, i.e. that allow assignment to an array or that do not allow initialisation of an array at the point of definition.)

I am not going to elaborate on the subtleties of Java definitions except to warn you that the semantics of these is quite different depending on whether the variable is of fundamental type or of derived (array) or user defined type.

While I am writing about definitions of variables, I should warn you about clearly distinguishing between declarations and definitions. The syntactic similarities can lead you into assuming that the results are semantically equivalent. Whether 'int i' is a definition (creates storage to hold the value) or a declaration (simply makes the name 'i' available in the current scope) is a matter of context. For example:

int i;          /* definition */
struct X {
  int j;          /* declaration */
};
int foo (int k);  /* declaration */
int foo (int m) {  /* definition */
  return ++m;
}
int main(){
  int n;        /* definition */
  n=0;
  return n += foo(n);
}

Note that the three cases marked as definitions have different semantics. i is at global scope and so has static storage duration. This means that it will be zero initialised at load (execution) time. m is a parameter and so will be initialised by the value provided in a call to foo (e.g. foo(n) in main()). n is a local scope variable and so is left uninitialised and so a value must be written (assigned) to it before any attempt is made to read its value.

k is interesting because it is a pure declaration that can never be defined. Any names used in declarations of parameters at prototype scope have no semantics, like comments they are ignored by the compiler. [Actually this is not quite true because it is syntactically allowed to declare a type name in this scope though the result is completely unusable - get in the habit of declaring type names early even if you are not going to define them at that time.]

What about j? In C 'j' can never be uttered unless proceeded by the name of an instance of an X. In C++ and Java this is no longer true as any member function (OK, there aren't any in the above code snippet) can use member variables by their unqualified names.

I bet you never thought that there was so much to understand about something so simple as declaring/defining variables. If you are moving between languages make doubly certain that you understand any changes in the semantics of such simple things. Before I go on let me underline one area where the changed semantics between C and C++ can cause considerable irritation. In C, that last statement 'return n+=foo(n);' has a single semantic meaning. It passes the value of n to foo() and returns n incremented by the result to exit().

In C++ the same will happen in the above code but other things can happen which means that, if you care, you must look at the full context. If foo() takes a reference parameter, n can be changed by the call to foo() so the value of n on the right of += may not be the same as the value on the left. Another subtle difference that I am not going to detail here is if n had been qualified as volatile.

i += i++;

There is no problem with the syntax of this statement. You might expect the result to be to add i to itself and then increment it. In other words you expect the final value stored in i to be (2*i + 1). I chose this code because no matter how you interpret the syntax you should finish with the same answer. This leads naïve programmers to be happy that what they have written is a clever way of achieving their intentions.

Unfortunately neither C nor C++ places a strict enough requirement on when or how the process of writing to storage shall take place. The above line of code requires that i be updated twice. Once by adding the initial value to the current value and once by incrementing the current value. C/C++ places no requirement on the order in which those updates take place. Remember that modern CPUs perform much of their arithmetic in registers and you can see that one possibility is that one register holds the result of i+i and another the result of incrementing i. Now the final outcome would depend on the order in which the registers were written back to storage. In neither case will the result be what you expected.

Actually C/C++ do not even require that the updates shall happen sequentially. If they happen in parallel (which is entirely allowed by these languages) you can finish with a real mess. It is because of this possibility that both C and C++ place a caveat of 'undefined behaviour' on such code.

Java is more specific in its requirements and so avoids this problem. However there is a cost in efficiency. I think I can claim without fear of contradiction that Java can never match the best optimised C/C++ because it must always strictly adhere to the order of evaluation and updating required by the language specification. This is a normal trade off of safety for speed. There is a simple code guideline to avoid problems of this kind:

Never write code that requires two or more 'writes' in a single statement.

When you understand what that means and why it is a good guideline you will be ready to make exceptions to it.

i += ++i;

Well I am leaving this one to you as an exercise. What I would like you to do now is to stop and think very carefully about what you have read. Now go and write (yes I really mean it, go and write) a coherent explanation of all the possible outcomes from executing that statement. Only the most experienced of readers will already have a complete understanding of what is wrong with such code. If you can complete the analysis in more than one language do so. Now email your work to me.

I will be profoundly surprised if I get even half a dozen responses, and even more surprised if the majority have covered all the issues even with the help of the above.

Notes: 

More fields may be available via dynamicdata ..