Journal Articles

CVu Journal Vol 1, #3 - Feb 1988 + Programming Topics
Browse in : All > Journals > CVu > 013 (15)
All > Topics > Programming (877)
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: Defining New Data Types In C++

Author: Martin Moene

Date: 28 June 2010 08:54:00 +01:00 or Mon, 28 June 2010 08:54:00 +01:00

Summary: 

Body: 

C++ is an object oriented superset of the C programming language, developed by Bjarne Stroustrup. The intention of the language is to make large scale program development more manageable by incorporating data abstraction, increased type checking and modularity, while retaining the low-level features of C that made it so popular. With few exceptions, any valid C program is also a valid C++ program.

In this article, I am going to demonstrate one of C++'s most widely used and powerful feature; the ability to create new data types which are totally under the programmers control. This encapulates the whole philosophy of the C++ programming language, which is to increase the ease by which a large program can be built up out of individual modules crafted by different programmers.

Standard C comes complete with a number of built-in types; char, int, float, double and long. Each of these types have their own operators and the decision as to how the operators are applied is decided by the compiler when the program is compiled. For example, when adding two floating point numbers, the compiler will obviously generate different code to that used to add two integers.

C++ provides the facility to generate new types and to specify how the normal range of operators are applied to the new types. For example, we can create a new type called intset which is a data structure of a set of integers. This new type can then be used in the same way as the built-in types. You can use the + operator to create the intersection of two sets, the * operator to create the union of two sets and the - operator to create the disjoint of two sets.

Because the + operator is already used to add two numbers, we will have to overload it to indicate that it will be used for another purpose. This is another aspect of C++; the facility to redefine the behaviour of existing operators. The compiler differentiates between different uses of the operator by looking at the type of the operands.

Finally, we need to define how the new types are created and destroyed. When a type is declared locally in a block, it is automatically created on entry to the block, and destroyed on exit. With the built-in types, this is performed by the compiler. C++ allows us to specify our own means, although most users stick to the conventional approach of allocating space for the type on entry and destroying the space on exit.

It would probably be easier if I illustrated this. I am going to create the new type, intset, which is simply a set of integers. The C++ code to do this is as follows

// Define class intset
class intset {
   int set[];
   int size;
public:
   intset(int);
   ~intset(int);
   int& operator[](int);
   void operator=(intset&);
   int& operator+(intset, intset);
   int& operator-(intset, intset);
};

Note: The // symbol denotes that the rest of the line is a comment.

The class enscapulates all the data and functions required to manipulate type intset. All definitions before the public keyword are private to the class and cannot be accessed by the user without explicitly qualifying the class name. In normal use, the user never needs access to private data and functions.

The public class contains, in order, the names of functions to create the new type, destroy it, access one member of the set (using the standard C subscription format which has been overloaded here), assign an element to a set, compute the intersection and to compute the disjoint of two sets. As we develop the type, we can easily add extra features. For example, we can allow all the elements of one set to be copied to another by overloading the = operator again.

void operator=(intset);

The compiler automatically selects the appropriate method of applying the operator.

The new type is declared as follows

intset myset;

When a type is declared, the following function is called to create the new type in the current scope of the program.

intset::intset(int sz)
{
   size = sz;
   set = new int[sz];
   pos = 0;
}

The other function, ~intset, is a destructor. It is automatically called when the program exits from the scope in which the type was created. It's purpose is to destroy the allocated type.

intset::~intset()
{
   delete set;
}

new and delete are similiar to the familiar C functions, malloc and free. The argument to new is a single type which is used to calculate the amount of space required. delete returns the allocated space back to the operating system freestore. The equivalents in C are

set = malloc(sizeof int * sz);
free(set);

Access to the elements of the set is achieved through the following function, which is called whenever an element of an intset is accessed.

int& intset::operator[](int s)
{
   if (s <0 || s >= size)
      abort("Set bounds out of range");
   return set[s];
}

Accessing an element is done in the same way as access to an array; by using the subscription operator. However, unlike an array, an intset is one dimensional.

Notice how we have added array bounds checking to our new type. In other languages, Pascal, Ada or BASIC, the processor will output an error and stop if we attempt to access outside the declared range of an array. C did not provide this, and consequently lead to some hard to find bugs. Often programs would write data all the way through memory until stopped by a reset or the memory management unit.

Assigning elements to a set is best done like this

intset myset; 
myset = 12;

The number 12 then becomes a member of intset. There is no point in using the subscription operator since the actual position of an element in a set if of no interest. The code to assign elements to a set is as follows

void intset::operator=(intset& element) 
{ 
   if (pos == size) 
      abort("Set overflow"); 
   set[pos++] = element;
}

Next, we overload the + operator so that it will compute the intersection of two set operands. The result of the computation is held in allocated memory, since we cannot be sure what is going to be done with it. The user may either assign or compare it with another set.

intset operator+(intset x, intset y) 
{ 
   int xi = x.pos; 
   int yi = y.pos; 
   int nn = 0;

   // Allocate enough space to hold the largest set 
   intset result[max(xi,yi)];

   for (int jj=0; jj<xi; ++jj)
      for (int kk=0; kk<yi; ++kk)
         if (x[jj] == y[kk]) 
            result[nn++] = x[jj]; 
   return result; 
}

This is not exactly the best way to do the job, but it works and can be easily understood.

Finally, we overload the = operator again so that it can be used to copy the elements of one set to another. Note the differences between the parameters of this function and the previous operator= one.

void intset::operator=(intset z) 
{ 
   int u = z.pos; 
   if (u > size) 
      abort("This can't be happening!!"); 
   for (int jj=0; jj<u; ++jj) 
      set[jj] = z[jj]; 
}

The error message is given, because it is not possible for the intersection of two sets to be larger than the size of the largest set! It is simply included as a catchall if there was a bug in the function to handle the + operator.

Now we will put this all together in a program that uses the new type to create two sets, and build a third that consists of the intersection of the first two.

set_example() 
{ 
   intset a[4],b[4],c[4];
   
   a = 12; a = 45; a = 90; a = 16;
   b = 17; b = 45; b = 16; b = 78;
   c = a + b;
}

When the program enters function set_example, the constructor is called three times to allocate space for three new types. Then we enter four elements into the sets using the overloaded assignment operator. The following line then computes the intersection of the two sets and stores the result in a third. Since the result of the intersection was a set rather than an integer, the compiler used the second assignment function instead of the first.

This concludes my introduction to the C++ programming language. Unfortunately, as of the time of writing, there are few C++ translators on the market. Zorland is rumoured to be bringing out a budget C++ compiler, as opposed to a translator. Such a compiler would translate the C++ source code directly into machine code, rather than into C source code. The resulting machine code will be far more efficient and compact. Hopefully, when this happens, we will see C++ begin to achieve the even greater recognition that it is deserved.

Notes: 

More fields may be available via dynamicdata ..