Journal Articles

CVu Journal Vol 14, #5 - Oct 2002 + Francis' Scribbles from CVu journal

Browse in :

All > Journals > CVu > 145 (10)
All > Journal Columns > Francis' Scribbles (29)
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: Francis' Scribbles

Author: Administrator

Date: 07 October 2002 13:15:55 +01:00 or Mon, 07 October 2002 13:15:55 +01:00

Summary:

Body:

Compatibility Issues

One of the hottest topics in the C and C++ communities is the issue of compatibility between the two languages. There are two extreme views:

C should be a strict subset of C++
C has nothing to do with C++

Most of us are somewhere between those two. I think we need to spend some time considering the issues and our own positions. I am not going to spend much time on the technical arguments relating to the first of the above because Bjarne Stroustrup has a well considered position being published by the C/C++ Users Journal. However I think it fair to try to remind people about why Bjarne Stroustrup's position would tend to that end of the spectrum.

Many languages are initially designed by a single person. Some then go on to become standardised. When that happens most language designers retire to the background, bite their lips and let others get on with changing their brainchildren. The only case I am aware of where this did not happen is C++. Bjarne Stroustrup continues to play a major part in its specification and evolution. He not only continues to have a major interest in the development of C++ but is a key player in it. I hope he will forgive me for saying that I think that means that he continues to have strong emotional as well as intellectual ties to it. He has a very definite view of what it is designed for and how it should evolve.

Now right at the beginning Bjarne Stroustrup made a practical (political) decision to build C++ on top of C, even when that meant the design decisions were not those that would have been made in a green fields development. There were excellent reasons why that decision was right. It took a language that was rapidly gaining popularity and that was an essential tool for Unix development and supplied a migration path to a much broader based language. However, I am also convinced that Bjarne Stroustrup designed C++ to eventually replace C. That was part of his vision. But let me be absolutely clear, it was not part of the vision of those responsible for the ongoing design of C. Ritchie had passed the batten on to J11 and eventually WG14. Those that took Bjarne Stroustrup's view that C++ was to replace C migrated to J16 & WG21 in the period 1989 to 1992. Those that believed C had an independent place in the World stuck with J11 & WG14. That is the root of trouble because there is a way in which both groups believed at some deep level that they were the guardians of the true flame.

One consequence has been the tendency for some C experts (aficionados) to decry C++ as a deeply flawed design that failed to learn the lessons of the past. At the same time some C++ enthusiasts were determined that C was an interesting historical relic that should have been quietly laid to rest. Meanwhile the vast majority of ordinary practitioners were receiving conflicting messages. Often introductory books had whole sections on `C++ as a better C', and far too many writers got away with describing C++ as a superset of C. None of these people were ill-intentioned but the upshot is that we have a vast number of programmers who talk about something called C/C++ and believe that those who prevent C from being a subset of C++ are being obstructionist.

A major issue is that the two languages are close enough so that it is advantageous to some to be able to write code that will compile correctly with both a C and a C++ compiler. One key group are those responsible for the Standard libraries for C and C++. They do not want to have to write separate versions of the common parts of these libraries for the two languages. But writing versions that compile correctly for both languages is hard work, and they can see that a unification of the languages would reduce their work.

Then there are many developers working in C++ who want to be able to use libraries that have been written in pure C. This kind of compatibility makes good sense but even the common parts of the Standard libraries highlight serious issues with this approach. Look at strchr(). Its first parameter is a char const * in C, and it returns a char * based on that input parameter. The parameter has to be const qualified if it is to work with both mutable and immutable C-style strings. But the return type must not be const qualified if it is to be usable with mutable arrays of char. C has to take the perspective of trust the programmer because it does not have a sane alternative. C++ fixes the problem by splitting strchr() into a pair of overloaded functions. But this is not cost free, not only does it require overloading, but it also means that we have to forgo having a unique address for strchr(). In other words we cannot pass a function pointer to functions like strchr() in C++. Of course there are other solutions to designs where C would use a function pointer. My point is that even unifying the semantics of const for C and C++ results in a ripple effect, one that may not be acceptable to the community that uses C as its main development tool. And it is that community that J11 and WG14 is supposed to serve (how well they meet that obligation is a different issue). It is not the primary responsibility of those committees to consider the needs of programmers who want to write code that will compile identically in both C and C++. That does not mean that these committees have no responsibility to such people, which is why they attempt to remove gratuitous incompatibilities. However that is a far from easy task as even among those responsible for C++ most would be reluctant to claim they understood all the implications of the language design. How then should we expect those who are C specialists to understand the fine detail of C++.

Let me consider another aspect of unification, the spirit of compromise. It is my contention that that serves neither language. In a recent article in CUJ Bjarne Stroustrup reiterates a proposal to make the semantics of void* in C++ those that it has in C. In other words, remove the need for a cast to convert a void* into any other pointer type. In the spirit of compromise he is proposing that we weaken the C++ type system. From his perspective unification is more important than a design decision that has frequently been given as an example of the way in which C++ is better than C. Sorry, but I cannot buy that.

Of course if you start from the premise that C and C++ should be unified because it is an unfortunate error of history that they are not then compromise to get unification makes sense. And once they are unified efforts can be made to recover the high ground. But I think history teaches us something different. The widespread adoption of C++ probably did depend on a substantial degree of compatibility, though I am not so certain when I look at more recent experiences (e.g. Java and C#). It seems that languages that meet genuine commercial needs survive, and those that do not die. Algol 68 is arguably a much better designed language than Fortran but the world of numerical methods came down on the side of Fortran. The decision had nothing to do with compatibility. And we could ask how compatibility helped C gain such a strong foothold.

There is no other case where compatibility with an existing language has been an issue. Indeed the concept of compatibility only makes sense if C++ was intended to be a replacement for C in the same way that Fortran 77 was intended to replace Fortran 66. But if that were the case then it would have been the task of J11 and WG14 to standardise C++.

Self-Compatibility

What almost every developer wants is to be able to use C++ libraries, and many also want to be able to use C ones as well. But being able to do that requires an entirely different set of requirements.

There are two major considerations. The first is that of an ABI, an applications binary interface (I hope I got the term right). This means that fundamental types and compatible user defined types (i.e., ones that can be declared in both C and C++, the so called POD types) have to be compatible. But this isn't even necessarily true between object code produced by different C implementations on the same platform and is completely unsupported when it comes to C++ (e.g., different implementations use different name mangling algorithms). Of course these are accepted as being non-standards issues, correctly so in my opinion.

Without a per-platform agreement on such things as the size and layout of fundamental types there is no compatibility and we have to resort to code being shipped as source, even if it is just plain C. And now we start to get reasons why the user might want to compile the same source code with both a C and a C++ compiler. Actually he does not, he just wants to be able to compile his entire source code with his C++ compiler. Establish good ABI standards on each platform and that problem goes away. If I do not need to compile C source because I can use a library shipped as object code then I largely stop being concerned about source code compatibility between C and C++. I could argue that increasing compatibility between C and C++ will make things worse because it will reduce the pressure for ABIs. T

he second problem is much nastier, that of fundamental types. The core design of C means that new types such as complex just about have to be fundamental types. Providing a fully functional complex type in C without language provision for operator overloading is impossible. Note that operator overloading leads to reference types. But introducing such features in C leads almost inevitably to C++.

On the other hand the natural way to introduce new types into C++ is via appropriate libraries. I think that the designers of C++ would be very loath to introduce a bundle of new fundamental types just for the purposes of compatibility with C. C++ provides powerful tools for making user defined types first class citizens. That is one of the strengths of C++ and it would be odd to revert to fundamental types whenever C decides that it needs a new, fully supported type. Of course we can argue the merits of C having a complex type, but I contend that that is up to the pure C community to determine.

I do not know how we solve the problem of link time compatibility between a fundamental type in C and a user-defined equivalent in C++. However if we focus our efforts on link time compatibility we might solve the problem. I am certain that if we pursue the path of unification of C with C++ we will not have the energy and resources to deal with the real and immediate problems.

Food for Thought

Have a look and see how many new books for C novices have been written in the last decade. Yes, one consequence of stability was a marked reduction in publishing interest. Now look and see how many new books have been published for C++ novices in the last couple of years. Some argue that the visibility of such books indicates the health of a language. If that is true, C is nearly dead and C++ is definitely on the way out.

C++ programmers know this is not true, so why do they think it is true of C?

Just as a matter of interest, how many different compilers for pure C do you think there are on the market today? Of course most of those are not in the shrink-wrapped market because those that need them know where to go and will want to buy multiple licenses.

Next time someone asserts that C is dead, or that C++ is dying, ask for their evidence. And when they give it, check its relevance because mostly such statements are based on a pretty superficial perspective.

Problem 4

When doing code inspections you need to cultivate a suspicious mind. In that light consider the following simple function and comment on what you would check and what minimal changes you would require.

void foo() { 
  mytype* mt_ptr = new mytype; 
  bar(mt_ptr); 
  delete mt_ptr; 
}

I wonder how quickly you realised that it was essential to look at the definition of bar. Once you get the idea, the longer you think about the disasters it can perpetrate the worse it gets. For example, which of the following function declarations is the greatest harbinger of doom:

something bar(mytype *); 
something bar(mytype const *); 
something bar(mytype * const); 
something bar(mytype const * const); 

something bar(mytype * &); 
something bar(mytype const * &); 
something bar(mytype * const &); 
something bar(mytype const * const &);

The first instinct that many have is that the pass by value cases must be safe because they do not change the original. That instinct really needs quick modification. Not being able to change the original just might be the biggest disaster of all. Consider a pretty minimalist definition:

mytype * bar(mytype* mt_ptr) { 
  delete mt_ptr; 
  return 0; 
}

Now you see it. Just because a function gets a copy of a pointer does not mean that it cannot damage the original because it can simply invalidate it. Of course the function is pure madness, particularly with that name, but there are lots of mad programmers out there. Oh, and in case you were wondering, it does not matter how many const qualifiers you apply to the parameter, you can still apply delete to it in Standard C++.

So what about the cases where the pointer is passed by reference? Well have a look at:

void bar(mytype * & mt_ptr_ref) { 
  delete mt_ptr_ref; 
  mt_ptr_ref = new mytype[1]; 
}

See the problem this time? Exactly, your change requires that delete[] be called on the mt_ptr in foo().

The nasty thing about the problem that I am trying to highlight is that looking at the function declaration tells you very little of use. You have to look at the definition. Pointers are like scalpels; they are highly refined tools that experts can use constructively. However they are lethal in the hands of others. Because C++ is designed as a language to meet the refined purposes of library designers as well as the coarser needs of the application programmer we have to learn the danger signals:

Raw delete/delete[]

Anytime you see a delete or delete[] in a free (i.e. non-member) function be deeply suspicious. If you see one in a member function, have a look at the destructor for the type. We should only be destroying what we own. Sometimes we delete one thing to replace it with another, but that should only be done inside a type that owns the resource.
Suspect raw pointer parameters

There are many cases where these are fine, but you should have checked the quality of the programming that produced these functions.
Do not use raw pointers for dynamic resources.

I think that one is close to an absolute injunction. It is the task of a destructor to release resources. Dynamic resources should be owned by something that releases them in a destructor. Generally this ownership should be provided by some form of smart pointer.
Do not mix arrays with single instances

Sometimes we create single instances dynamically. auto_ptr is designed to handle the lifetime of such objects. It has slightly unusual semantics, with the result that it should always (well almost) be passed by reference and returned by value. Think carefully till you understand why the ownership semantics of auto_ptr lead to that coding guideline. There are other smart pointers that have more sophisticated semantics, however they are almost all designed for single objects. When it comes to arrays we should distinguish between those whose size is fixed and those whose size can vary. Mostly, where the size is fixed we should consider a simple raw array. Its semantics will almost certainly meet our needs. If we need variable size, or there is some other reason that we need to use dynamic memory for our array then our prime candidate should be a vector. We should not be calling new[] ourselves. The STL vector encapsulates a dynamic array so that all the handling is done for us. If we need the actual array (for example, to pass to a C function that takes a pointer to an array) we can extract the address of the internal array. That address remains valid until you do something that causes the vector to relocate its internal storage. I know that this is not guaranteed in the original C++ Standard, but it should have been and a response to a defect report has corrected that omission.

In conclusion: Most programmers should never use delete and even fewer should use new[] and delete[]. Dynamic resources should always be owned, usually by some form of smart pointer or container. You should not be using delete on parameters, though delete applied to member that is a pointer can sometimes be acceptable. Note that this is a stronger statement than the commonly made request that the language should not allow delete on a const pointer or pointer to const.

As I have said before, cultivate a suspicious mind and learn to program responsibly. Just as you have to trust other programmers, make sure your own code merits the trust of others.

Problem 5

Consider the following brief program. What is the output and why?

struct base { 
  virtual void report { 
    std::cout<< "base" << std::endl; 
  } 
}; 

struct derived: public base { 
  virtual void report { 
    std::cout<< "derived" << std::endl; 
}; 

int main() { 
  base * x = new derived; 
  try { 
    throw *x; 
  }
  catch(base & br) { 
    br.report(); 
  }
  return 0; 
}

Notes:

More fields may be available via dynamicdata ..