Journal Articles

CVu Journal Vol 1, #3 - Feb 1988 + Programming Topics
Browse in : All > Journals > CVu > 013 (15)
All > Topics > Programming (877)
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: The ANSI Standard For C

Author: Martin Moene

Date: 28 June 2010 08:58:00 +01:00 or Mon, 28 June 2010 08:58:00 +01:00

Summary: A summary of the proposed ANSI extensions to the C Programming Language By Steven W.Palmer

Body: 

Introduction

Since it's introduction in 1972 by Dennis Ritchie at Bell Laboratories, C has matured over the years with each new implementation. A number of features designed to provide the language with stricter type checking and to cater for the increasing sophistication of the underlying hardware have evolved from different companies in different formats. The new proposed ANSI standard attempts to bring together the more popular and versatile extensions with a totally revised reference manual which removes many of the ambiguities of the early one. This article looks briefly at each of the new ANSI extensions.

Trigraphs

To cater for systems supporting only the ISO 646-1083 character set which provides only a sub-set of the full ASCII character set, ANSI allows the unsupported characters to be represented by trigraphs. A trigraph is two consecutive question marks, followed by a representable character.

The proposed trigraphs are shown below

Trigraph
Character
??( [
??) ]
??- ~
??< {
??> }
??! |
??' ^
??= #
??/ \


The new escape code \? has been provided to prevent strings that resemble trigraphs from being expanded. For example

puts("The trigraph of [ is \??(");

without the escape character, the ??( would be converted to [.

Reserved Words

The keywords, asm, fortran and entry, are no longer reserved (although asm is preserved in the C++ reference manual, and may later be re-introduced back into C.) The keywords const, volatile and signed have been introduced. const is a type modifier that indicates that the identifier has read-only properties (i.e. it may not be assigned a value after initialisation.) volatile specifies that the identifier is subject to external changes, and must not be changed by the optimiser. signed is already present in most compilers, and complements unsigned.

Escape Codes

The standard set of character codes is extended with \a for the BELL character, and \v for the VERTICAL TAB character. Numeric escape codes now support hexadecimal codes in the format \xnn or \Xnn where nn is a two-digit hexadecimal number. Both of these extensions are already implemented in most modern C compilers.

The use of an escape character followed by a newline to indicate line continuance was originally restricted to macro definitions and strings. ANSI now allows it to apply to any line of C source code.

Strings

Strings may be concatenated by following the terminating quote of one string with the initial quote of another, and with only whitespace in between. For example

puts("This is a very long string that has to be split\
over two lines to fit");
can now be written
puts("This is a very long string that has to"
"be split over two lines to fit");

The decision as to whether similar constant strings share the same storage space has not been resolved, although it is still considered bad practice to modify string constants. Microsoft C V5.0 supports string concatenation.

Preprocessor

The following preprocessor directives have been added; #error, #pragma and #elif. #if has been extended to allow the use of the 'defined' keyword in preference to #ifdef. #error forces it's argument to be written to the user console. This allows users to implement their own error checking at compile time. For example, if a program must not be run with a STACK size of larger than 30000 bytes, then

#if STACK > 30000
#error "STACK too big - Truncated"
#undef STACK
#define STACK 30000
#endif

will detect this and inform the user. The actual behavior of #error in relation to the compiler is mostly compiler dependent at the moment.

#pragma allows options to be passed to the compiler from within the source code. The choice of options is manufacturer dependent, and will vary between systems. As an example, Microsoft C V4.0 provides a single #pragma to toggle generation of stack checking code. #pragma check_stack+ causes the compiler to include code to check the stack on entry to subsequently declared functions to ensure that there is enough room for local variables. #pragma check_stack- switches off stack checking code.

The preprocessing rules have been rewritten to remove the ambiguities inherent in the old compilers. One notable change is that ANSI now allows the use of a macro name in it's definition. Where this occurs, the macro name will not be expanded. This allows the use of a macro to replace or redefine an existing function or macro. For example

#define sqrt(x) (((x) < 0) ? 0 : sqrt(x))

Here, if the sqrt macro is passed a negative value, then it will implicitly return 0, otherwise it will call the actual sqrt function.

Predefined Macros

ANSI specifies that the following macros are predefined:

__LINE__The current source code line number.
__FILE__The source code file name.
__DATE__The date at the time the macro was translated in the form "mmm dd yyyy"
__TIME__The time when the macro was translated in the form "hh:mm:ss"
__STDC__A predefined macro with a non-zero value


Stringization

Stringization, and token-pasting described in the next section, are already supported by Microsoft C V5.0 and later versions. ANSI allows macro arguments to be converted to strings in the expansion by prefixing the argument name in the definition by the single # character. For example

#define ASSERT(n,l) puts("Error " #n " in line " #l) 
... 
ASSERT(90, __LINE__);
expands to (assuming __LINE__ is 100)
puts("Error " "90" " in line " "100");
which, after string concatenation, is equivalent to
puts("Error 90 in line 100");

Token-Pasting

Token-pasting is the merging of two disjoint tokens in the definition to create a single new token. The inclusion of ## between two objects forces the preprocessor to combine the two objects after all macro expansion has been performed. For example

#define gencode(x) callgen##x()
If the following is encountered in the source code
gencode(12)
it will be preprocessed to
callgen12()

Numeric Constants

The single type modifier letter, 'L' or 'l' which was used to cast an integer to a long integer is now complemented by 'U' and 'F'. 'U' can be used with any integral constant, and casts the type to unsigned. It may be used in conjunction with the long-integer modifier letter.

'F' can only be used with floating point values, and changes the default type of a floating point number from double to float. A floating point literal is always assumed to be of type double by the compiler.

However, the proper use of function prototypes eliminates the need to use a type modifier at all. Where needed, a cast is more explicit.

The type, long float, is no longer supported. An additional type, long double, has been introduced. It's range is implementation defined, but should be the same as, or longer than, double.

Generic Type

The type, void *, has been included as a generic type which can be cast to any other valid C type. For example

char *malloc(unsigned int);
has now been replaced by
void *malloc(unsigned int);

The size of void * is virtual, so the use of sizeof with void * has no defined meaning.

Initialisation

Unions may be initialised. The value assigned to the union must be the same type as the first member of the union. For example

union {
   long Vid_IO_Ptr;
   char *VidPtr;
} VidMap = 0xb8000000;

Volatile

volatile is a new type specifier that is really only of interest to the optimiser. It specifies that the object it declares is volatile. In other words, it's value may be changed by an external event during program execution. For example

volatile int i;
...
set_vsync(&i);
wait_vsync(); /* Synchronise with first VSYNC */
i = 0;
while (i) {
   /* Do odd-jobs while waiting for vertical sync */
   scan_keyb();
   check_comm();
}

In the above code, taken from an actual program, the optimiser would detect that identifier i was not altered inside the loop body, and would optimise out (remove) the body of the loop! The use of volatile warns it that the identifier is modified by forces outside the programs control.

Constants

The introduction of the const keyword is hoped to reduce the use of #define to

denote constant values in a program. The use of const in place of #define is preferred as it conveys far more information to the compiler about the programmers intentions.

Including const in a declaration specifies that the identifier does not (and should not) change value during program execution. A constant must be initialised at declaration. Any attempt to subsequently change it's value will generate a warning from the compiler.

Unfortunately, without proper protection, there is nothing to stop an external library function from modifying a constant when passed the address of that constant. For example

const int p = -1;
...
n = libfunc(&p);

if the library function, libfunc, modifies the value of p, then the problem will never be detected. There is no way, in C, to indicate that a function taking a pointer argument will modify the data to which the pointer points. Unless the programmer is careful, the use of the above code will cause problems if the constant was placed in ROM or the hardware traps write-access to constant storage.

I would have liked to have seen the use of volatile extended to function declarations to indicate that a function modifies external data through the use of a pointer. For example

int libfunc(volatile int *);

Operators

The unary plus operator has now been incorporated into the C language with the same precedence as the unary minus. Other than this, the set of operators and their associated precedence have not been changed.

Run-Time Library

The standard C library has been enhanced with extra functions proposed by ANSI. Many of these functions are already available in many compilers, most notably Microsoft which has very close association with the ANSI committee.

Conclusion

In these few pages, I have attempted to outline the major extensions provided by the proposed ANSI C Standard. At the time of writing, the standard has still not appeared, and is not expected to for many more months. However, looking at the range of extensions supplied with the new Microsoft C V5.0, it would probably be safe to assume that all these extensions will appear in the final standard with little or no change.

A more complete coverage of the proposals is given in the standard C reference manual for all serious C programmers:

C: A Reference Manual. Samuel P.Harbison. Guy L.Steele.
Prentice Hall Software Series. Second Edition.

The book costs over £20, but is very good value for money. Make sure that you get the second edition.

Notes: 

More fields may be available via dynamicdata ..