Title: Code Review - A Big Number Class

Author:

Date: 03 June 1999 13:15:31 +01:00 or Thu, 03 June 1999 13:15:31 +01:00

Summary:

Body:

Francis passed me a header file for a code critique. The original author is an experienced C programmer who has been slowly making a shift to a C++ style of coding. I am told that this was one of his earlier efforts. I think it is appropriate to keep the original author's name out of it. I am going to comment on all aspects of the code including coding style, however I will snip some places where the code is repetitive.

First note that though this is a header file it has no sentinel in the text passed to me. Because some readers are unfamiliar with such coding details I will comment on them. Every header file (actually there are rare exceptions to this rule) needs a mechanism to prevent it being multiply included into the same translation unit (the thing that the compiler itself sees, after pre-processing). Conventionally such sentinels are provided by using a conditional #define. So a header file mytype.h will start with:

#ifndef MYTYPE_H
#define MYTYPE_H

The algorithm for generating the sentinel name is to use the file name in all uppercase letters and replace any illegal characters (remember that identifiers can only contain letters, digits and underscores) for identifiers by underscores. The only time this convention comes unstuck is if you are silly enough to start a header file's name with a digit. The file will need to have #endif as its last entry.

// #DEFINES
#define MAX_STRING 1050
#define MAX_ALN         32700
#define WORDLENGTH    16      
#define LOGWORDLENGTH  4           
#define MAXNEGWORD     0x8000

I am not sure what the following is intended to do. If it was for some form of conditional compilation I would prefer to see it always defined (with values of 0 or 1) rather than commented out.

// #define CD_CHECK

I find the author's numerical comments curious because the whole point of using manifest values here (names instead of literal values) is that the values are irrlelvant. Much more important is the authors use of all uppercase identifiers. I know this remains popular but it is fundamentally flawed because it breaches a very strong coding convention: pre-processor identifiers should never include a lowercase letter, and all other identifiers should contain at least one lowercase letter. Sticking to this coding rule ensures that the pre-processor never tries to replace an identifier. I know that the Standard C Library breaks this convention but that is not an excuse for others to do so.

I know this code was written before exceptions and namespaces became generally available, however now I would expect this kind of code to be encapsulated in a namespace. It is definitely a mistake to have that first enum (signflag) hanging out in global space, even in times past it should have been tucked into the class definition to avoid polluting global space (those two identifiers 'plus' and 'minus' are far too obvious not to be replicated somewhere else. In addition defining the enum here means that it is out of context and should be commented. The same comments apply to the third enum.

// ENUMs
enum signflag { plus, minus };
enum errnum   { OK,  //  0
  NUMBER_TOO_BIG,  //  1
  SUBTRACT_WRONG_WAY,  //  2
  BAD_DIGIT,  //  3
  DIVISION_IMPOSSIBLE_CARRY,  //  4
  UNSUPPORTED_BASE,  //  5
  IMPOSSIBLE_CHAR,  //  6
  OUT_OF_MEMORY,  //  7
  DIVIDE_BY_INTEGER_ZERO,  //  8
  DIVIDE_BY_ALN_ZERO,  //  9
  IMPOSSIBLE_ALN_ALN_MULTIPLY,  // 10
  UNRECTIFIED_ALN_FOUND,  // 11
  SOMETHING_FISHY,  // 12
  CONSTRUCTING_ZERO_ARRAY  // 13
};
enum logic_rep { and_rep, or_rep, xor_rep };

If you are unfamiliar with various compilers you may be surprised by the author's derivation of aln (a very poor name, why not call it VeryLongInt. If you do not want to type that out everywhere include typedef VeryLongInt vli; towards the end of the file) from CObject. I guess the author wanted to be able to use some implementation provided containers. Now that we have the STL such compiler dependant mechanisms are unnecessary.

class aln :public CObject {
  private:
    // private data members
      signflag         m_sign;
      int              m_length;
      unsigned int*   m_digits;
      static int       m_count;   
        // remove once debugged   
      static int       glob_base; 
      // default base for input/output
      static errnum    glob_error;

Regular readers will know that I am not enamoured by the use of prefixes to provide scope information. The usage here seems totally bizarre. Why m_count but glob_base? In addition the comment on the latter seems completely unhelpful. If you need a comment, ensure it is meaningful. And why is m_digits a pointer to unsigned int? (also note that m_length is a plain int).

// private member functions
      void rectify();
      bool anew(unsigned);
      bool renew(unsigned);
      void halve();

My only problem with these functions is that some of their names seem less than descriptive. anew and renew apparently take an unsigned int and return a bool, but what do they do?

The following declarations seem to be very badly placed. By their very nature, friends cannot be private because they are not members of the class. Declaring them in the private zone of your class definition is certainly misleading. I will also comment on the individual blocks because there are lessons to be learnt here as well.

      friend int aln_compare(const aln&, const aln&);
      friend int aln_compare(const aln&, const int);

Compare is a symmetric operation, so either there should be just one version or there must be three. Actually I would favour one as a public member and renamed 'compare'. As it is a member it has all the required access and its name is in the class scope so needs no prefix (it doesn't anyway because the parameter types would provide overload resolution if the function was (as here) provided in global scope.

I have reorganised the following so that I can write about a single block of related functions at one time.

      friend errnum aln_add(const aln&, const aln&, aln&);
      friend errnum aln_add(const aln&, const int, aln&);
      friend errnum aln_add(aln&, const int);
// + operators
public:
    aln operator+(const aln&) const;
    aln operator+(const int)  const;
    friend aln operator+(const int, const aln&);
    aln operator+=(const aln&);
    aln operator+=(const int);

      friend errnum aln_subtract(const aln&, const aln&, aln&);
      friend errnum aln_subtract(const aln&, const int, aln&);
      friend errnum aln_subtract(aln&, const int);
// - operators
public:
    aln operator-() const;
    aln operator-(const aln&) const;
    aln operator-(const int)  const;
    friend aln operator-(const int, const aln&);
    aln operator-=(const aln&);
    aln operator-=(const int);

I think that the implications of the above is that the whole design was ill-thought out. As the data separates sign and magnitude (a perfectly respectable design for representing large integers) we need two private functions, one to find the difference between absolute magnitudes and one to find the sum. These should be private (possibly protected) member functions. The logical declarations are:

errnum sum(aln const &);
errnum difference(aln const &);

The errnum return value will be used to return a success/failure report. You could consider using exceptions for these, but I can live with this solution (indeed if you consider failure states to be normal you probably should not use exceptions at this low level of your implementation/design. Note that I have not declared these as const member functions because I intend that the aln instance calling them will be modified.

I will also need two public member functions to provide the basic operators:

aln& operator += (aln const&);
aln& operator -= (aln const&);

While I was only provided with a header file, let me look at the implementation of one of these operator functions:

aln& aln::operator += (aln const& rhs){
  errnum err=((m_sign == rhs.m_sign) ?
          sum(rhs) : difference(rhs);
  // process errnum
  return *this;
}

The other function is so similar that you might wander if you could just forward to this one. Actually you could but it would incur copying the rhs so that you could negate it. There are various other alternatives, but the basic principle is to separate out the actual evaluation from the operator itself.

Once you have the above member functions in place, you can provide namespace scope functions to provide

aln operator+(aln const & , aln const & ); 
aln operator-(aln const & , aln const & );

Again it may be instructive to examine the implementation of one of these:

aln operator+(aln const & lhs, aln const & rhs){
  aln temp(lhs);
  return temp += rhs;
}

Any half way respectable compiler will not burden you with a using a copy constructor for the return but will use its liberty to optimise away copy ctors to construct temp in the location used for the return-by-value temporary. There are other possibilities, such as operator constructors. Note that the operator functions should not process exceptions, that is the job of the calling code.

Once you have done all this, you need to do something similar for multiplication. Finally you will need to provide division. That is the one where you will have to work hardest as an implementor. However note that you should always provide an in-class compound assignment operator and then forward to that to provide the more general plain arithmetic operator. There is never any need to use friend functions to support these simple arithmetic operators. I know that you will find such abuse of friendship even from otherwise superb practitioners but that is no excuse to do so yourself.

< I have snipped the authors multiply and divide declarations >

      friend aln logic_op(const aln& a, const aln& b, const logic_rep oper);
      friend aln cpli(const aln& a);

I am far from sure what the code's author intended to do with the above two functions. Again I find the friend declarations deeply suspect (and note that these are in the private part of his class definition.)

    // wordshift operators
      aln wordshift_l(const int) const;
      aln wordshift_r(const int) const;

Without knowing exactly how the data is being represented internally (I suspect that it is not in some form of BCD despite the implications inherent in the identifier m_digits) I can only speculate on these two functions. However I do not think I would use functions with those declarations, because they necessarily include the construction of new instances. I think this is the wrong level for that.

  public:
      errnum read(const CString&);
      errnum read(const char* const);
      errnum write(char* const) const;
      errnum write(CString&) const;

I definitely feel these functions are poorly designed. We should expect read and write functions to work. So failures are definitely candidates for exceptions. Of course, I would also change to using the standard string type instead of an implementation dependant alternative. I would also clarify the purpose of those functions by naming them read_from and write_to.

    // debugging aids
      void slug() const;
      void slug(char*) const;   
      inline int length() { return m_length; };

I have recently moved towards the idea of having debugging support provided by a separate class that is declared as a friend of the main class. In that way no debugging overheads are incurred unless you need them. In addition, adding extra debug functions only requires recompilation of the debug class followed by relinking for test code. In other words reduce the dependencies so that production code is not affected by debug support.

// constructors & destructor
      aln();
      aln(const int);
      aln(const unsigned);
      aln(const long);
      aln(const unsigned long);
      aln(const aln&);
      aln(const char * const);  
      aln(const CString& c); 
     ~aln();

I have some reservations about the provision of all those constructors. What do you think?

      friend aln abs(const aln&);

I understand the thinking behind making this a namespace scope function (so that its use is consistent with that for the built-in integer types.), but I think it would be better done with a member function and a forwarding function. So in class I would provide a public inline function:

aln & abs(){ m_sign = plus; return *this)

and at namespace scope I would provide:

aln abs(const aln & item) {
  aln temp(item);
  return temp.abs();
}

Other operators

Note that I moved some operators earlier (add and subtract), and snipped some of the others (multiply and divide),

// = operators
aln operator=(const aln&);
aln operator=(const int);

I am very curious about the author's choice here. Almost everywhere else he supports many integer types. I think for consistency he should either support assignment from long and unsigned long (there really is no great value in directly supporting the smaller types, just let the standard arithmetic conversions do the work for you.) or he should support none of them and rely on the compiler to convert from the built-in types to aln by calling an appropriate constructor. The former is more efficient while the latter may avoid surprises. Note that with the versions he has declared he will get some very nasty surprises because long, unsigned long and unsigned int will all be converted to int (with warnings if the warning level is set high enough)

// bitshift operators
aln operator>>(const int) const;
aln operator<<(const int) const;
aln operator>>=(const int);
aln operator<<=(const int);          
// bitwise operators
aln operator&(const aln&) const;
aln operator&=(const aln&);

aln operator|(const aln&) const;
aln operator|=(const aln&);

aln operator^(const aln&) const;
aln operator^=(const aln&);
aln operator~() const;

The above seem reasonable until you realise that some are symmetrical operators (&, | and ^) which should work with promotions from built-in types. Those operators should be declared at namespace scope and forward to the matching compound assignment operator to do the work (as we did for addition). The shift operators are fine as member functions, though here also the actual work should be done by the matching compound assignment.

// ! operator
int operator!() const;
// comparison operators
int  operator<(const aln&) const;
int  operator<(const int) const;  
friend int  operator<(const int, const aln&);

int  operator>(const aln&) const;
int  operator>(const  int) const;
friend int  operator>(const  int, const aln&);

int  operator==(const aln&) const;
int  operator==(const  int) const;
friend int  operator==(const  int, const aln&);

int  operator!=(const aln&) const;
int  operator!=(const  int) const;
friend int  operator!=(const  int, const aln&);

int  operator<=(const aln&) const;
int  operator<=(const  int) const;
friend int  operator<=(const  int, const aln&);

int  operator>=(const aln&) const;
int  operator>=(const  int) const;
friend int  operator>=(const  int, const aln&);

I do not think that any of these operators should be provided this way. They are all essentially operators that need to account for conversions. Each one can be provided by a single namespace scope operator that uses a public member function that does the compare. In addition, logical operators should return a bool value.

For example, replace the last three declarations with a single (out of class)

bool operator >=(aln const & lhs, aln const & rhs);

and define it as:

bool operator >=(aln const & lhs, aln const & rhs){
return lhs.compare(rhs) >= 0;
}

You will almost certainly want to make these inline functions as they effectively do little more than call the compare function.

 // input/output
  friend istream& operator>>(istream&, aln&);
friend 
  ostream& operator<<(ostream&, const aln&);

Well if you have read anything that Francis and I have written on the subject you will know that we consider using friendship here as unnecessary and leading to bad habits. Provide the functionality with member functions writeTo(ostream &) and readFrom(istream &). Provide the operators with normal namespace functions that forward to the appropriate member function to do the work.

// increment and decrement
      aln operator++();      // prefix  
      aln operator++(int); // postfix 
      aln operator--();      // prefix  
      aln operator--(int); // postfix

Fine.

// cast operators
operator int();
operator unsigned();
operator long();
operator unsigned long();
operator CString&();

Providing these is almost certainly going to lead to some nasty surprises. That last one is particularly suspect. Before you provide a conversion operator (to give them their correct name) convince yourself that you really, really need it. Then only provide the smallest number you can manage with. While I might well provide a function that converted a large integer type to a string I would never (well almost never) do it via a conversion operator. Instead I would use a member function toString(). You really should avoid providing the compiler with ways to implicitly change one type into another.

static void clearerror();
      static void warnerror(errnum);   
      bool no_error() const;
      static void setbase(int);   
      int askbase();

I cannot comment on these because I do not understand their purpose.

};  // end of class aln.
aln power(const aln&, const aln&);

Curious, why is this function declared at namespace scope? Surely it should either be a member function or be granted friendship (ugh!).

Some Thoughts

As I looked at the writer's data I wondered why he had not provide an array of signed char to handle the data. This would make addition and subtraction relatively easy and multiplication would be easier (you would know that the result of multiplying two signed char values would fit into an int without overflow.)

For simplicity's sake consider the problem of adding two arrays (of signed char) of equal length that represent the magnitude of two large integers. (note that I am coding so that the first element of an array represents the least significant values)

bool addTo(signed char * lhs,
           signed char  const* rhs, length){
  for (int i=0; i!=length; ++i){
    lhs[i] += rhs[i];
    if (lhs[i]<0) {
      lhs[i+1]++;
      lhs[i] |=  0x7F;
    }
  }
}

The above code makes a considerable number of assumptions. Can you list them? Can you refine the idea so as to remove the bugs that result from these assumptions? How about removing the assumption that the arrays are of equal length?

Try it and send your results in for publication.

And finally, why should operator functions return by const value rather than by value?

Notes:

More fields may be available via dynamicdata ..