Journal Articles

Overload Journal #102 - April 2011 + Programming Topics
Browse in : All > Journals > Overload > o102 (7)
All > Topics > Programming (877)
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: Benefits of Well Known Interfaces in Closed Source Code

Author: Martin Moene

Date: 06 April 2011 18:32:28 +01:00 or Wed, 06 April 2011 18:32:28 +01:00

Summary: Designing a good API is a significant challenge. Arun Saha suggests taking inspiration from outside.

Body: 

The availability of a high quality data structure library is a necessary ingredient for the success and timely completion of any software project. It allows the programmers to focus on the problem domain rather than the solution domain. But what are the options if no such library is available and an in-house one has to be developed? Fortunately, all is not lost. The in-house library can be designed to use a standardized or well-known interface, which reduces a lot of the strategic design, tactical design, testing, learning, adaptation, and maintenance efforts. This article focuses on two key aspects, interface design and functional testing.

Introduction

Consistent use of a library keeps uniformity, both syntactic and semantic, across a project. It is essential for the development and maintenance of any large or multi-programmer code base. In C++, the standard library specifies a bunch of data structures (a.k.a. containers) (for example, array, vector, list, map, set, unordered_map, unordered_set, bitset) and algorithms (for example, find, search, sort, partial_sort) that are usable with any suitable built-in or user-defined type [C++2011, relevant sections: 20, 23, 24, 25]. The availability of the standard library provides immense benefits to a project: the programmers can look beyond the repetitive structural and algorithmic issues and focus more on the issues of the problem domain. The first implementation of such a type independent library was published by SGI and is known as Standard Template Library.

Although these containers and algorithms are specified in the C++ standards (C++1998, C++2003, and upcoming C++0x), they are not part of the core C++ language; the library extends the language to provide some general components [Josuttis99].

There are multiple implementations of the C++ standard library available. Among them, SGI, GNU and STLport are open-source implementations, and Dinkumware is a commercial one. [Implementations]

However, there exist systems and environments, mostly embedded systems, where the C++ language is used without the standard library. One such example is ‘Embedded C++’ [EC++]; it is a subset of C++ which prohibits templates (among other things) and thereby a major part of standard library, including the containers and the algorithms, is unavailable.

If some project wants to use the standard library and if one of the open-source implementations is technically and legally suitable, then that can be chosen to be used – end of story.

However, in a commercial software or a proprietary code base, using open-source software is frequently not an option. There are multiple reasons, and the following is a non-exhaustive list:

Thus, the commercial houses have two major options for using a C++ data structure library:

Our experience is with Option B (Develop), and in the remainder of this article we shall share two major lessons learned from that choice. One is the interface design and the other is comparative testing.

Interface design

The first and foremost item in developing a library is designing the interface. By interface, we mean all the public methods and attributes that are visible to the user code. While it is possible to design an interface in multiple ways, it is hard to produce the ‘right’ one. However, though the choice of Option B means developing an in-house implementation, fortunately there is still something that can be ‘borrowed’ from the C++ standard library. The interface!

For the interface of the to-be-developed library, our recommendation is to choose exactly the one specified in the C++ standard.

There are many reasons why.

It is the standard

API design is hard. A study of the obstacles faced by developers when learning APIs [Robillard09] notes:

APIs support code reuse, provide high-level abstractions that facilitate programming tasks, and help unify the programming experience (for example, by providing a uniform way to interact with list structures).

The interface of the C++ standard library is widely known; virtually every C++ programmer is aware of it. For example, to insert an item at the end of a list or vector, the de facto, the idiomatic, and the most natural way is to use the method push_back().

Since it is the standard, some other benefits include:

Lower barrier to entry

One of the costs (and often a barrier) of using a library is learning its interface. The aforementioned study warns that:

APIs have grown very large and diverse, which has prompted some to question their usability. It would be a pity if the difficulty of using APIs would nullify the productivity gains they offer.

It would be a bigger pity if programmers have to, on top of that, learn different APIs – for example the C++ standard library and potentially different in-house libraries at different organizations – for doing the same job, such as inserting an element to a list. The number of APIs that we are talking here is large: dozens of classes, each with scores of methods, scores of algorithms, and a long list of idioms and good practices. There exists a significant amount of material – books, articles, tutorials, blogs, forums, newsgroups, mailing lists – on aspects of the C++ standard library; it is a substantial learning curve to master them and become an effective user.

If the in-house library uses the same API as the C++ standard library, then the cost of training the programmers is completely eliminated (or drastically reduced) because they can simply continue to apply their pre-acquired knowledge (or learn from already existing materials). This applies equally well for the C++-skilled programmers who are hired in future. On the contrary, if the in-house library is built with a different API, all the knowledge and mastery suddenly becomes useless.

Long term impact

Any software interface, standardized or otherwise, has long term implications. The implementation can be easily modified, but once it is published and the remaining code base starts using it, changing an interface

is extremely hard. Choosing an already stable interface reduces such impacts.

Also, if for some reason, in future, the organization wants to switch from Option B (Develop) to Option A (Purchase), then the migration is extremely easy because all user code is written against the same interface.

Rule of least surprise

The Art of Unix Programming [Raymond03] observes:

The easiest programs to use are those that demand the least new learning from the user – or, to put it another way, the easiest programs to use are those that most effectively connect to the user’s pre-existing knowledge.

So, following an existing standard is the most natural choice to make.

Testability

If the in-house library follows the same interface as the C++ standard library, then testing the correctness of the library is much easier. This important aspect is now explained in more detail.

Testing

The choice of interface specification is a good first step, but that itself is not sufficient. The crucial design invariant – the interface compatibility with the C++ standard library – has to be actively maintained. That leads to the following questions:

The solution that we found most useful is to develop a test suite for the library with the following strategy:

  1. Each unit of the library, for example a container, an iterator, an algorithm, or an allocator has its own unit test.
  2. Separate unit tests are independent and stand-alone C++ programs, all of which are run in a regression suite.
  3. The unit tests verifies the behaviour of a unit against the specification in the C++ standard.
  4. A unit test exercises each interface of the unit in all possible ways.

It is best to explain with examples. In the following, excerpts from the vector test code are shown.

Comparative testing

All the tests follow a common structure: at the beginning of the test code, a control is provided to run the test against either a reference standard, or the in-house code. Listing 1 shows the structure for vector.

// vector_test.cpp
typedef unsigned long int Type;
#ifdef STD_REF
  #include <vector>       // From standard library
  typedef std::vector< Type > TypeVector;
#else
  #include "vector.hh"    // From in-house library
  typedef inhouse::vector< Type > TypeVector;
#endif
typedef TypeVector iterator TypeVectorIter;

#include <cassert>
#define UNIT_TEST assert

static const Type Values[] = {10, 20, 30, 40, 50,
   60, 70};
static const size_t ValuesLength =  
   sizeof( Values ) / sizeof( Values[ 0 ] );
int main() { 
  size_t valuesIndex = 0; 
  TypeVector vut;  // Vector Under Test 

  UNIT_TEST( vut.empty() ); 
  for( valuesIndex = 0; 
       valuesIndex < ValuesLength;
       ++valuesIndex ) {
    vut.push_back( Values[ valuesIndex ] );
  }

  UNIT_TEST( ! vut.empty() );
  UNIT_TEST( vut.size() == ValuesLength );
  UNIT_TEST( vut.front() == Values[ 0 ] );
  UNIT_TEST( vut.back() == 
     Values[ ValuesLength - 1 ] );
  valuesIndex = 0;
  for( TypeVectorIter it = vut.begin();
       it != vut.end();
       ++it, ++valuesIndex ) {
    UNIT_TEST( *it == Values[ valuesIndex ] );
    UNIT_TEST( *it == vut[ valuesIndex ] );
    UNIT_TEST( *it == vut.at( valuesIndex ) );
  }
  UNIT_TEST( valuesIndex == ValuesLength );
  UNIT_TEST( ! vut.empty() ); 
  vut.clear(); 
  UNIT_TEST( vut.empty() ); 
 }
			
Listing 1

First it defines the type of the elements that the vector consists of. For simplicity in this example, we used the built-in type unsigned long int, although it could be any user defined type (struct or class). When the macro STD_REF is defined, we run this unit test on a reference implementation of the standard library. Otherwise, we run this unit test on the in-house library. Observe that, in both ways of setup, we defined a type named TypeVector. The remainder of the file vector_test.cpp runs all tests on TypeVector, without any knowledge of the source of the library code.

Thus we have a simple way of choosing one among many possible vector implementations and run the unit test on the chosen one. If the implementations conform to the C++ standard, then the unit test would compile with all of them, and execute to produce identical results in all of them.

Test construction

The next task of the unit testing strategy is creating the test cases. All the test cases are created as a sequence of two steps:

  1. Do some operation(s) on the unit (here, vector).
  2. Programmatically verify that the properties and contents of the data structure matches the expected result(s).

The rest of Listing 1 shows an example of some simple test cases applied on vector, where programmatic verification is done using asserts.

It tests some methods of vector (empty(), push_back(), size(), front(), back(), at(), begin(), end(), clear(), operator[]) and the type vector::iterator.

For each unit, the conformance and correctness testing consists of few simple steps. The steps for compiling and running for vector are as follows:

  1. CC := g++ -W -Wall -Werror -ansi -pedantic -std=c++0x
  2. CC -DSTD_REF -D_GLIBCXX_DEBUG vector_test.cpp -o ref_vector
  3. CC vector_test.cpp -o inhouse_vector
  4. ./ref_vector
  5. ./inhouse_vector

Things to note for these steps:

This example is rather simplistic, it uses only few member functions available in vector. In reality, there are lot more methods in the vector template class. To obtain basic confidence in the conformance and correctness of the in-house library, the unit test code tests each method in isolation. Then the methods are tested in different combinations and sequences.

Other experiences

Without risking any non-conformance to the standard interface, the implementation of the in-house library can offer some niceties which may or may not be available in other implementations. Here are two examples.

Some other general strategies:

Conclusion

Consistent interfaces make life easier. The same is true for software development. This article emphasizes that the interface provided by the C++ standard library, which sometimes go unappreciated and overlooked,is very valuable by itself. As the author of the in-house library, it has been realized numerous times that choosing to follow the standard interface was the most important design decision that was made. Following the interface conventions as in the C++ standard library has tremendously helped (non-library) programmers to easily understand and easily use the newly written in-house library. It brought the programmers to a common and consistent style both syntactically and semantically. Overall, it has proven to be a great step in reducing software complexity in the organization’s code base.

References

[C++2011] ‘Working Draft, Standard for Programming Language C++’, 02 2011. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2011/n3242.pdf

[EC++] ‘The Embedded C++ specification’, 1999.http://www.caravan.net/ec2plus/spec.html

[Implementations]‘Dinkumware C++ Standard Library’. (http://www.dinkumware.com/), ‘The GNU C++ Library Documentation’ ( http://gcc.gnu.org/onlinedocs/libstdc++/), ‘SGI Standard Template Library Programmer’s Guide’, 1994 ( http://www.sgi.com/tech/stl/), ‘STLport C++ Standard Library’ (http://www.stlport.org/)

[Josuttis99] N. M. Josuttis, The C++ Standard Library, A Tutorial and Reference. Addison-Wesley, 1999.

[Raymond03] E. S. Raymond, The Art of Unix Programming, 2003. http://catb.org/~esr/writings/taoup/html/ch01s06.html#id2878339

[Robillard09] M. P. Robillard, ‘What Makes APIs Hard to Learn? Answers from Developers’, IEEE Software, vol. 26, no. 6, 2009. http://www.cs.mcgill.ca/~martin/papers/software2009a.pdf

[TDD] ‘Test-driven development’, accessed 2011-March-10. http://en.wikipedia.org/wiki/Test-driven_development

Notes: 

More fields may be available via dynamicdata ..