Journal Articles

CVu Journal Vol 12, #1 - Jan 2000 + Programming Topics
Browse in : All > Journals > CVu > 121 (30)
All > Topics > Programming (877)
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: Questions & Answers

Author: Administrator

Date: 05 January 2000 13:15:35 +00:00 or Wed, 05 January 2000 13:15:35 +00:00

Summary: 

Body: 

Four members took the time to offer answers to one or more of the questions posed last time. I hope some others will be encouraged to join in next time. Feel free to add to those already answered, however you will find some new questions at the end of this column. I have batched the answers by question headings. In the interests of space I have not repeated the questions (the answers generally stand alone).

Q1. (string->enum mapping)

Answer from Colin Hersom

In both the methods suggested, the order of the strings in the array MyEnumStrings is required to be the same as the enums, so adding a new enum value doesn't just involve adding a new string, it requires that the new string is added in the right place - this is an implicit dependency between the order of the string array and the enum values. My advice in any situation where there is a dependency is to make it explicit rather than implicit. To make an explicit dependency involves creating some form of data structure that will hold a string and its corresponding enum value. In its simplest form this is a structure like:

struct EnumPair {
  MyEnum evalue;
  string svalue;
};

and you create an array of these things:

EnumPair pairs[] = {
  { eOne, "One"},
  { eTwo, "Two"},
  { eThree, "Three"},
  { eFour, "Four"},
  ...
};

There is now no requirement for the enum to be in any particular order, and the MyEnumStrings array has been replaced by the array of structures. To search it is simple:

MyEnum StringToEnum(string s) {
  for (int i = 0; i<sizeof(pairs)/sizeof(EnumPair); i++)
  if (s == pairs[i].svalue) return pairs[i].evalue;
  return eInvalid;
}

Note that not only does this release the programmer from maintaining the enum values consecutive, but also allows different spellings of the strings to match the same enum, e.g. the pairs array could be expanded to something like:

EnumPair pairs[] = {
  { eOne, "One"},
  { eOne, "one"},
  { eOne, "1"},
  { eTwo, "Two"},
  { eTwo, "two"},
  { eTwo, "2"},
  ...
};

A search using a for-loop is fine for a small number of strings. If there are N elements in the array then, assuming a uniform distribution of strings over the possible values, the search time is approximately equal to N/2 times the time for a string comparison. For more than around ten strings, this is significantly more than it needs to be.

To improve the speed, we need something better than an array to represent the association. The usual solution here is a hash table, although some versions of the STL do not have this but you can use the "map" class provided. In either case, you can no longer initialise the association at compile time but must created it at run time, e.g. (sketch only):

void CreateAssociation() {
  table.push_back(e1, "one");
  table.push_back(e2, "two");
  table.push_back(e3, "three");
  table.push_back(e4, "four");
}

You can then use it like:

MyEnum StringToEnum(string s) {
  const_iterator found = table.find(s);
  if (found != table.end())
  return (*found).second;
  return eInvalid;
}

Q2. (output float)

Answer from Colin Hersom

That seems to depend on your compiler library. Some (like GNU) have a method called form which takes a format string and arguments just like printf:

  cout.form("%16.8f", float_val);

Others don't and you may have to use good old sprintf:

  char buf[20];
  sprintf(buf, "%16.8f", float_val);
  cout << buf;

There was an article in Overload a few years ago about rolling your own formatting in C++

Perhaps it is time to revisit that issue - in Overload - as the Standard C++ Library is now fixed and not quite what it was when that article was written.

Q3. What does const mean here?

Answer from Steve Love

Given the code snippet:

class Complex {
  double re, im;
  // ...
public:
  // ...
  Complex operator- () const {return Complex (-re, -im);}
  // ...               ^^^^^
};

The const refers to the function ( operator-() ) - it is a "const member function." This means it must not modify the state of the object, specifically it cannot modify the members of the implicit this pointer passed to it; the this pointer is a pointer to a const Complex (all non-static member functions have this as an invisible parameter). In such a case, picture the signature of the function as :

Complex operator- () (const Complex *this) { /* ... */ }

Thus the compiler should complain if, in the function definition, we write:

re = -re;  // equivalent to this->re = -this->re;
im = -im;  // equivalent to this->im = -this->im;

This is somewhat simplistic in that there are ways of getting around this, and good reasons for doing so. First, the methods:

Complex operator-() const {
  const_cast<Complex *>(this)->re = -re;
  const_cast<Complex *>(this)->im = -im;
  return *this;
}
class Counting {
public:
  // ...
  bool operator< (inr rhs) const {++count; return val < rhs; }
  // ...
private:
  int val;
  mutable int count;
};

It's a design decision, but altering the value of count doesn't really change the "public" state of the object. Allowing count to be mutable is preferable over throwing away the type-safety of making the operator<() non-const.

Answer from John Crickett

In brief: the const refers to the function, to roughly quote BS: "It indicates that the function does not modify the state of a Complex".

Long reply: const can be used for several things in C++:

variables:

  const char myChar = 'a';  // constant char variable
  char const myChar = 'a';  // exactly the same
  const char* pMyChar = &myChar;   
// same again, except we access the char through a pointer.
  char const* pMyChar = &myChar;   // same as above
  char aChar = 'a';  // a char variable.
  char* const pMyChar = &aChar;      
// the pointer is constant this time,  the char is not.
  const char *const pMyChar = &aChar    
// constant pointer to a constant char.

All the above can be used for variables, return types for functions, or parameter lists as appropriate. They are also common to C.

C++ adds the extra "idea" of Constant Member Functions, declared as in the example behind this question:

Complex operator-() const;

To roughly quote BS (again): "It indicates that the function does not modify the state of a Complex".

Q4. (dynamic 2D arrays)

Answer from Steve Love

The definition

double matrix [m][n];

requires 'n' to be a constant expression. More specifically, n's value must be known at compile time. The above declaration describes an array of m elements, each containing an array of n doubles.

The reason for n being required to be a constant is that it represents the number of elements in each array. In this way it defines the type of the elements - matrix is an array of double[n].

If the compiler were unable to determine the value of n, using the notation matrix[x][y] to find an element would be meaningless. According to Bjarne Stroustrup (C++ Programming Language, 3rd Ed.) the internal representation of a multi dimensional array is: int mda[3][5];

[11][12][13][14][15][21][22][23][24][25][31][32][33][34][35]

I.e. one large structure, accessed as if it were a table with rows and columns. Thus, to find location [x][y] we must know that maximum possible value of y. Without it we're doomed. Hence, the second index to a multi dimensional array is part of its type signature - omitting it is an error.

That's why

new double[m][n];

is correspondingly illegal - if n can be a variable value, it can never be used as part of its type. Consider passing a multi dimensional array to a function:

void multi (double[][]);

is illegal for the reasons specified above. However,

void multi (double[][5]);

is legal, but not for an array of type double[3][4] - it's essentially a type mismatch.

It still comes back to the fact the compiler must be able to determine the maximum value of the second index. Allowing

matrix = new double [x][y];

prevents that.

There are (at least) two ways of circumventing this restriction.

The first, and easiest, is to use the STL vector or valarray types, neither of which suffer from the representation problems of multi dimensional arrays. Consider a vector of vectors of ints. The notation matrix [x][y] for subscripting works fine on my compiler (Borland Builder 4).

The second method, where STL vectors are not available, would be to create a matrix class and define the subscripting mechanism for yourself. This may have the added advantage of allowing the perhaps more intuitive subscript notation of matrix(3, 5)

I wonder if that is really more intuitive. Intuition is often based on experience. I see that as a function call and would much prefer to see subscripting defined in a way that is consistent with the built-in versions. This is not hard as the following shows.

Answer from Colin Hersom

Why do you want a dynamic 2-D array? See the answer in C Vu 11.6 that Francis gave to Bill Cartwright about 2-D arrays and Fortran for an explanation of the meaning of double indices.

If you really need to have contiguous space for an array, then the only way in C++ (or C) is to allocate it as a 1-D array and then use your own calculations to view it as a 2-D array, e.g.:

int *array = new double[m*n];

and create a function to calculate an index:

int index2D(int i, int j) {return i*m + j; }

So you might access the array element a[i][j] by:

a[index2D(i, j)];

Note that whether you choose to have the first or second index incrementing faster is entirely up to you, rather than being defined by the language as in Fortran.

Apart from the use of new to allocate space, that solution is very "C" and not very "C++". You might need to have many 2D arrays of this form, and you might also like to use the normal array indexing to access it, in which case a C++ class is required, with an overload for the index operator:

class array2D {
public:
  array2D(int m, int n)
    : left_dim(m), array(new int[m*n]){}
  ~array2D() { delete[] array; }
  int *operator[] (int i){return array + i*left_dim;}
private:
  int left_dim;
  int *array;
};

The operator[] replaces (most of) the calculation made in index2D, and the size of the first dimension is stored with the array, leading to fewer problems if you have many arrays with different dimensions.

You can allocate using a normal variable:

array2D array(m,n);

and index using:

array[i][j];

The array[i] portion uses the overload to index down the array, returning an int*, which is then subject to the second (non-overloaded) index operator. This gives you the freedom to change the implementation of the 2D array (say to an array of pointers to arrays) without having to find all the places where index2D was used (or abused).

Note that in the class above you must also provide (or prohibit) copy and default constructors and the assignment operator. You might consider reference counting and/or copy-on-write to reduce overheads.

You should also be looking at using vector<int> rather than int* for the internal array, so saving any direct call to new and the problems that raises. You need to be more careful about the operator[], since the "+" operator won't work like C vectors.

If you want to have 2-D arrays of things other than integers, then templating the class is required, which is straightforward.

Colin has given a brief outline. I would welcome an article on writing a fully templated version for publication in a future issue of Overload (it is clearly too advanced for C Vu).

Q5. (document break)

Answer from Steve Love

main () {
  vector<string>* doc;
  Break(doc);
  return 0;
}
void Break (const vector<string>* doc) {
  vector<string>::iterator I;
  char *line;
  for (i=doc->begin(); i != doc->end; i++) {
    strcpy(line, (*i).c_str());
    while(*line != '\0') { 
      cout << *line << " "; 
      ++line; 
    }
    cout << endl;
  }
}

Assuming that the doc variable is allocated and assigned elsewhere, the obvious problem is that in the Break() function, line is never allocated any space. Consequently the call to strcpy() is writing to invalid memory, which is probably what is causing the core dumps. Adding this line:

  line = new char [ (*i).length() ];
  strcpy (line, (*i).c_str());
//  while loop etc.
  delete [] line;
would fix the problem, or
  void Break (const vector<string>* doc) {
  vector<string>::iterator i;
  for (i=doc->begin(); i != doc->end(); i++) {
    for (string::iterator p = (*i).begin(); p != (*i).end(); ++p) {
      cout << *p << " ";
    }
    cout << endl;
  }
}

gets rid of the line variable altogether.

As a final note, using the original code,

  cout << line << endl;

would have been unpredictable since line was invalid anyway. In this case, cout would have received whatever happened to be in line up to the first '\0' character. No space was allocated to line before strcpy(), which thus writes over whatever line is pointing at (somewhere on the stack, one assumes). If you're lucky (!) it works. If not, you get a core dump (or NULL pointer assignment on MSDOS, Access Violation on MSWindows, etc. - variations on a theme).

There is no reason to expect an uninitialised pointer to point into the stack, nor that it will be a null which would be the only thing to trigger a 'null pointer assignment' message at the conclusion of execution. As it happens programs executed in many IDE's have their stacks zeroed at start-up and all zero bits is often used for a null pointer but you have no right to expect either of those things.

Answer from Colin Hersom

Since you have vector<string>, you are clearly using C++, so why the appearance of char* to hold the line? The string class is provided so that C++ programmers don't fall into all the holes that C programmers used to do (and still do). So let's solve the C problem first and then see how much safer it can be made in C++.

Consider the declaration:

char *line;

What does it mean? It says that line is a variable with enough space to hold a pointer to a "char". No more. You haven't even initialised it, so it points to somewhere random, which may or may not be a valid location for a char (either reading or writing). Now look at it first use:

strcpy(line, <something>);

Here you are copying <something> into the space pointed to by line. The amount of data copied is dependent entirely upon <something>, specifically the first zero byte encountered by strcpy. But you haven't pointed line anywhere, still less do you know how much space you are allowed to write to. So the fact that the program worked perfectly when you output the line as a whole was pure luck, many compilers would have caused it to fall over, some operating systems would fall over.

How do you fix that declaration? There are two ways, either by making line a suitable (fixed) sized array, or by allocating space on the fly. The fixed array looks like:

char fixed_line[1024];

This allocates 1024 bytes for you to play with. You have to be fairly certain that you aren't going to exceed 1024 bytes in a line, else you will overrun space again. Now the strcpy line is valid. However incrementing the fixed_line pointer is not legal and so you have to use a separate variable, which we can use line again:

strcpy(fixed_line, <something>);
char *line = fixed_line;
while(...)...

If you cannot be sure that you won't exceed 1024 characters (or whatever limit that you give for fixed_line), and it is rare that you can be absolutely sure, then you need to allocate space on the fly, based on the size of the strings that you have:

for (i=doc->begin();...)
char *dynamic_line = new char[i->length()+1];
strcpy(dynamic_line, <something>);
line = dynamic_line;
while(...)...
delete [] dynamic_line;

That really is quite a hassle - you have to remember to allocate the extra byte for the null and to delete the space once you have finished with it. You could make things faster, but even trickier to get right, by recording the amount of space you have allocated and only reallocate if you now need more.

Why did I need the statement "line = dynamic_line"? I already had a char* to hold the line, so why can't I use that directly? Suppose you have your while-loop as written, but used dynamic_line instead of line. At the end of the loop dynamic_line would point to the location containing the NULL at the end of the string. If you then passed this to delete[], you would be attempting to delete something that you had not allocated. If you look at your original, you also made that mistake - line was incremented until it pointed to the end, and then you tried to copy the next line into that place.

So how do you do this in C++? Firstly, replace that char* with a string:

string line;

Now this variable can be set from another string:

line = *i;

which allows you to play around with line without altering the value in the doc vector. You don't need explicitly to dispose of the contents of line, since the compiler will do this when line is overwritten or goes out of scope.

Q6. (array return)

Answer from Colin Hersom

I would like to ask you what you are really trying to do. You could want to produce a copy of the array g, or you could want to have a pointer to g that you can use like an array - it won't be a copy because if you write to it then "g" will also change. The second option is easy:

void test1(){
  int *i_ptr = return_int();
}
int *return_int() { return g; }

So in test1, i_ptr is a pointer to an integer, which return_int sets to point to the first element of g (g === &g[0]), and so you can regard i_ptr as an array of integers. At some other stage in test1, "i_ptr" could be made to point to some other array.

The first option is harder, and is not possible in this functional style using C. These days C can return structures, but it cannot return arrays (because the size of the array to copy is not necessarily known at compile time). If you need to copy the array, then you must pass a pointer to the result as a parameter:

void test1(){
  int i[4];
  copy_array(i);
}
void copy_array(int *array) {
  int i;
  for (i=0; i<4; i++)
  array[i] = g[i];
}

This is rarely recommended, for the obvious reasons that you need to know the sizes of the arrays involved and ensure that the space really is available (e.g. you don't accidentally use i[3], or increase the size of g to 5).

Answer from Steve Love

In short, the reason the code will not work is that arrays in C(++) cannot be copied by value. A C-style array, as in

  int x[10]

declares x to be an array of 10 ints. In fact, it is a pointer to an area of memory containing 10 ints, but in C++ this is a distinct type. "int x[10]" is not the same type as "int y[50]", even though they are both pointers to ints.

No, in both C and C++ x and y have the same meaning and same type, the type of x is array of 10 int and y is of type array of 50 int. In both languages identifiers convert to a rvalue of type pointer to int when the context requires it. You cannot assign to an rvalue.

Consequently, the expression

  x = y;

is invalid and trapped by the compiler's type checking mechanism (no, but because x cannot represent a modifiable lvalue).

However, a variable defined as a pointer to an int can take the value resulting from an implicit conversion from an array of int:

  int x[10]; 
  int *p;
  p = x;

is valid; p now points to the first element of x (as does x). The reverse is not true, because there is no conversion from a pointer to an int to an array of int: (no, see above)

int x[10];
int *p;
x = p; // invalid
The function
int * return_int(){
  return g; ..// where g is "int g[4];"
}

is fine because the g is converted successfully to a pointer to int. However,

x = return_int();

is thrown out by the compiler because an array of ints is not an assignable value (lvalue).

So much for the theory. Copying C-style arrays requires a function. Remember strcpy() from the C standard library? The most common implementation for copying arrays I've seen is

void copy_array (int dest[], int src[], int size){
  for (int i = 0; i < size; dest[i] = src[i], ++i);
}
int x[10], y[10];
copy_array (x, y, 10);

strcpy() was a little different in that it copied arrays of char (C-style strings) which were NULL terminated, so the length was implicitly known. The above function copies each element from array x into array y, also known as a deep copy.

I must confess that I strongly dislike for loops written like that one in copy_array. What is wrong with writing:

   for (int i = 0; i < size; ++i) dest[i] = src[i];  

You are also in good company by being confused by attempts to assign to an array identifier. In appropriate circumstances and array identifier decays to a pointer type either as an rvalue or as a non-modifiable lvalue which substantially comes to the same thing.

Q7. (dates)

Answer from David Jepps"

This problem could be tackled by using the % and / operands. E.g.

elapsed_minutes_since_1_1_1970 = elapsed_in_seconds/60;
remaining_seconds = elapsed_in_seconds%60;

You then work up through the chain to deal in hours, days, etc. Of course years are a bit trickier because of leap days and my algorithm would ignore leap-seconds.

Coding that algorithm is a bit too much like my day job so I wondered if there was another way using the library functions in time.h. What follows will only work if time_t is implemented as elapsed seconds (as it is in Turbo C++) but as it is short it would be worth a try. I think it's general enough to deal with time_t implementations where time_t is time in seconds elapsed before 1.1.70. I'm not sure what happens with negative time_t when time_t is implemented as elapsed seconds after some date after 1.1.70.

#include <time.h>
#include <assert.h>
struct tm Convert_from_elapsed(long elapsed_1970_in_seconds);
int main(void) {
  long elapsed_in_seconds = 0; /* Use any trial value here*/
  struct tm elapsed_as_tm =
      Convert_from_elapsed(elapsed_in_seconds);
  return 0;
}
struct tm Convert_from_elapsed( long elapsed_1970_in_seconds) {
  struct tm tm_start_1970;
  struct tm *p_tm_result;
  time_t time_start_1970;
  time_t time_start_system;
  const int one_second = 1; 
/* Using 0 failed in mktime() in Turbo C++* */
#ifdef __TURBOC__
   timezone = 0;
#endif
  tm_start_1970.tm_sec = one_second;
  tm_start_1970.tm_min = 0;
  tm_start_1970.tm_hour = 0;
  tm_start_1970.tm_mday = 1;
  tm_start_1970.tm_mon  = 0;
  tm_start_1970.tm_year = 70;
  tm_start_1970.tm_wday = 4;
  tm_start_1970.tm_yday = 0;
  tm_start_1970.tm_isdst = 0;
  time_start_1970 = mktime(&tm_start_1970);
  time_start_system = time_start_1970
     + (time_t)elapsed_1970_in_seconds - one_second;
/* A weak check that casting to time_t above was OK */
  assert( difftime(time_start_system,time_start_1970) =
    (double)(elapsed_1970_in_seconds - one_second ) );
  p_tm_result = gmtime(&time_start_system);
  return *p_tm_result;
}

The above has all been retyped by hand so I hope that there are not too many typos. It should work where time_t has been implemented as elapsed seconds beyond some system start date.

The idea is that you use the calculated struct tm to access the number of years etc elapsed.

Perhaps you could make the method more general by binary searching for the time_t implementation that represents the time elapsed in seconds?

Answer from Colin Hersom

The bane of the software engineer is the spec by example, especially when there is only one example. Without a proper spec you can only make guesses. With experience those guesses probably turn out right more often than not, but is it worth the bother? Who gets the blame when, a few months or years later, the program gives the wrong answer since none of the examples given demonstrated one obscure feature of the system?

So, find a specification by some means. What generated this number? Does that system have a spec? Do you know who wrote the system? If you are lucky, someone else answering this question might have seen this number before and suggest the spec. Or maybe not.

Curiously (or otherwise) spec by example seems to me (in my ignorance) to be a feature of the current vogue method called 'Extreme Programming'. Now would someone like to write an article on this for a future issue of Overload?

Questions

Q1. from Paul Rocca

Problem with odd length data packets.

I don't know if anyone can suggest a technique (or a set of libraries?) that can help me with this one. I need to read/write a bitstream pattern to/from a synchronous serial card. This would be easy apart from the fact that the data packets are 13 bits long. Every example program I have seen and all the serial cards I have looked at seem to be aimed at making transfers by bytes. Receiving doesn't seem to be as big a problem as I can pad it out with zeroes to two bytes, but I can't see any easy way of sending out 1.65 bytes. I thought about combining individual data packets until I got a whole number of bytes, but I need to be able to send packets out individually as they may be quite spaced out in time. Any help or suggestions much appreciated, also any pointers to resources that deal with synchronous serial comms would be gratefully received. The web seems very lacking in info for once.

Q2. from Edward Collier

What is wrong with?

At risk of displaying my ignorance to all, could someone please explain what is wrong with scanf() and fscanf()? I often see disparaging remarks about these (and other core functions) but without the benefit of an explanation, implying that, to quote Molesworth, "any fule kno".

Perhaps someone could produce a table showing which of the standard library functions are unreliable or unsafe, and, more to the point, why.

Now that question looks like a cue for a whole series. Something like "Using the Standard C Library Correctly" Don't let that frighten those willing to just answer Edward's question but could the C experts consider a longer term project? Thanks.

Q3. from Rodrigo Canellas

static members of a template class

What is the best way to declare and initialise static data members in template class, as in:

template <class T>
class X {
  static int i;
public:
  static int getI() {return i; }
};

Q4. Anon

Converting to all uppercase

I am using C++. What is the correct way to convert a string to all uppercase? Preferably by using the C++ Standard Library.

Q5. from Paul Rocca

A question of style, or is it?

A simple C style question, but I just can't get it to work. I would like to do something along the lines of creating a bitfield structure with one or two members as consts.

Well that is it for now. Get thinking and get writing.

Q6. from Paul Collings

Wrong Value in a const?

I have stumbled across something that is probably straight forward, but I do not understand it, Environment - BCB 3 Professional on WIN 95 Looking at local variables I find that a const int does not initialise to what I would have expected !

       int var1 = 2;
const  int var2 = 3;
watching local variables 
var1 2 (0x00000002)
var2 564396032 (0x21A40000)

var1 has the value 2 and var2 has the value 564,396,032 what does this mean?

Notes: 

More fields may be available via dynamicdata ..