Programming Topics + CVu Journal Vol 27, #4 - September2015
Browse in : All > Topics > Programming
All > Journals > CVu > 274
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: Anatomy of a CLI Program written in C++

Author: Martin Moene

Date: 09 September 2015 07:04:15 +01:00 or Wed, 09 September 2015 07:04:15 +01:00

Summary: Matthew Wilson dissects a small program to examine its gory details.

Body: 

This article, the second in a series looking at software anatomy, examines the structure of a small C++ command-line interface (CLI) program in order to highlight what is boilerplate and what is application-specific logic. Based on that analysis, a physical and logical delineation of program contents will be suggested, which represent the basis of the design principles of a new library for assisting in the development of CLI programs in C and C++ to be discussed in detail in the next instalment.

In the first instalment of this series, ‘Anatomy of a CLI Program written in C’ [1], I considered in some depth the different aspects of structure and coupling of a simple but serious C program. The issues examined included: reading from/writing to input/output streams; failure handling; command-line arguments parsing (including standard flags --help and --version and application-specific flags); cross-platform compatibility.

The larger issues comprised:

In this second instalment I will consider further these issues, in the context of a small but serious C++ program, with the aim of defining a general CLI application structure that can be applied for all sizes of CLI programs.

Strictly speaking, some of the differences in sophistication and scope between the first instalment and this do not directly reflect the differences between the language C and C++. Rather, they reflect the different levels of complexity that it’s worth considering when deciding in which language to implement a CLI application. I’ll come back to this, and point out some rather important differences, in the third instalment.

DADS separation

Before we start working on the example program, I want to revisit the classification issue. In the first instalment I argued that CLI program code written (or wizard-generated) by the programmer is one of:

To this list I now add a fourth:

In the examples of both instalments, the clearest example of declarative logic is the ‘aliases’ array that defines what command-line flags and options are understood by the program.

Example program: pown

To avoid destroying too many trees in the production of this month’s issue, I’m going to try and keep the code listings as short as possible by focusing on a small program, albeit one with most real-world concerns; in the printed magazine, these are truncated dramatically, but they’ll be available in full online [5]. For the purposes of pedagogy, I ask you to imagine that we need to write a program to show the owner of one or more files on Windows; in reality this is a feature (/Q) of the built-in dir command.

The features/behaviours of such a program include:

  1. parse the command-line, either for the standard --help or --version flags, or for the path(s) of the file(s) whose owner(s) should be listed;
  2. properly handle -- special flag. It’s very easy to simulate the problem with naïve command-line argument handling: just create a file called --help or --version (or the name of any other flags/options), and then run the program in that directory with * (or *.* on Windows);
  3. expand wildcards on Windows, since its shell does not provide wildcard-expansion before program invocation;
  4. for each value specified on the command-line, attempt to determine owner and write to standard output stream; if none specified, fail and prompt user;
  5. provide contingent reports on all failures, including program identity as prefix (according to UNIX de facto standard);

Non-functional behaviour includes:

pown.monolith

The first version of this program is done all in one file. Even such a simple program is remarkably large – over 230 non-blank lines – and a big part of its size is boilerplate. The program source has the following sections (parts of which are shown in much truncated form in Listing 1; the full version of this and all are available online):

// includes
#include <pantheios/pan.hpp>
#include <systemtools/clasp/main.hpp>
// . . . + Pantheios.Extras, STLSoft, etc.
#include <systemtools/clasp/implicit_link.h>
// . . . + 4 more impl-link
#include "MBldHdr.h"

// aliases
static clasp::alias_t const Aliases[] =
{
  . . . initialisers same as in Listing 4

// identity
#define TOOL_NAME  "pown"
int const toolVerMajor = __SYV_MAJOR;
// . . . + Minor, Revision, BuildNum
char const* const toolToolName = TOOL_NAME;
// . . . + summary, copyright, description
char const* const toolUsage = "USAGE: " TOOL_NAME " { --help | --version | <path-1> [ ... <path-N> ] }";
extern "C" char const PANTHEIOS_FE_PROCESS_IDENTITY[] = "pown";

// pown
int pown(char const* path)
{
  . . . 87 lines, retrieve owner (domain
  . . . & acct), result->stdout; diagnostic
  . . . logging to trace flow & log failures.

// main/program entry
int program_main(clasp::arguments_t const* args)
{
  // process flags and options
  if(clasp::flag_specified(args, "--help")){
    . . . 17 lines to initialise CLASP
    . . . usage structure, invoke usage
    . . . and return EXIT_SUCCESS
  if(clasp::flag_specified(args, "--version")){
    . . . 9 lines similar to "--help"
  clasp::verify_all_flags_and_options_used(args);

  // process values
  . . . rest of main() same as in Listing 4
}

// main/boilerplate
. . . 3 x ExecuteAround (see text)
			
Listing 1

It’s pretty clear that boilerplate is eating space, not to mention effort. Furthermore, structuring source in such a manner is an imposition on programmer visibility (and, I would suggest, happiness).

Note the inclusion of the MBldHdr.h header and use of the symbols __SYV_MAJOR, __SYV_MINOR, etc. (whose names violate the standard’s reservation of symbols, as they contain runs of two or more underscores). These are aspects of an extremely old, but still used, mechanism for controlling module version by an external tool, and I include them only to show how such schemes can be used with the proposed anatomical delineation discussed herein.

Separation of concerns – pown.alone

The first obvious thing is to partition the file. This can be done, at least in part, by identifying what parts confirm to the DADS classification. Let’s tackle all the identified sections (apart from includes, which is a necessary evil of C and C++ programming):

Given these designations, the parts may now be separated physically according to the scheme I have been evolving over the last few years, as follows:

Salient fragments of all the above are presented in Listing 2. Note that, for now:

// pown.hpp:
extern "C"
int pown(char const* path);

// pown.cpp:
. . .
#include "pown.hpp"
#include "identity.hpp"
. . .
int pown(char const* path)
{
  . . .

// entry.cpp:
. . .
#include "pown.hpp"
#include "identity.hpp"
. . .
static
clasp::alias_t const Aliases[] =
{
  . . .
int program_main(clasp::arguments_t const* args)
{
  . . .
. . . // other "main"s, including main()

// identity.hpp:
#define TOOL_NAME     "pown"

extern int const toolVerMajor;
extern int const toolVerMinor;
. . .

// identity.cpp:
#include "identity.hpp"
#include "MBldHdr.h"

int const toolVerMajor = __SYV_MAJOR;
int const toolVerMinor = __SYV_MINOR;
. . .
char const* const toolToolName = TOOL_NAME;
char const* const toolSummary  = "Example project for Anatomies article series in CVu";
. . .

// diagnostics.cpp:
#include "identity.hpp"

extern "C" char const PANTHEIOS_FE_PROCESS_IDENTITY[] = TOOL_NAME;

// implicit_link.cpp:
#include <systemtools/clasp/implicit_link.h>
// . . . + 4 more
			
Listing 2

‘Program Design is Library Design’ – pown.alone.decoupled

In the first instalment, I mentioned the importance I attach to being able, as much as is reasonable, to subject the guts of CLI programs to automated testing. As such, separating out the action logic into pown.[ch]pp is an important step. However, there’s still a problem. Consider the current definition of pown() (which is that from Listing 1 transplanted into its own source file, with requisite includes): it has three areas of undue coupling:

You may offer a fourth area of undue coupling – use of Pantheios C++ API diagnostic statements. The rejoinder I would offer to that is an article in itself, so in this context I will simply observe that diagnostic logging is important, it must reliably be available at all points during the lifetime of a program, it must be very efficient when enabled and have negligible runtime cost when not, it should be near impossible to write defective code using it, any non-standard (and they are all non-standard) diagnostic logging API will incur some coupling however far one might wish to abstract it, and that there is no (possibility of a) perfect solution. (Though I couldn’t be more biased) I believe that Pantheios offers the best mix of features and, since it may be stubbed readily at both compile and link-time, I think it’s as less bad as coupling can get.

To our three areas of undue coupling. The first two are basically the same thing: the output streams are hard-coded into the function, which restricts potential uses of the function. Even if we would always want those output streams in the wild, hard-coding makes automated testing more difficult. The answer is simple – to pass in the stream pointers as parameters to pown() – though the rationale may be less clear cut (see sidebar).

That just leaves coupling to identity. Fairly obviously, coupling to any preprocessor symbol is not a great idea. (The main reason why TOOL_NAME is even a preprocessor symbol is to facilitate widestring builds, which I'm not dealing with in this instalment; the other, minor one, is that it can be used in composing string fragments, as seen in the definition of the literal string initialiser for toolUsage in Listing 1.) The fix here is just as simple as with the streams: a parameter to the function, as shown in Listing 3.

#include <stdio.h>
int
pown(
  char const* path
, char const* program_name
, FILE*       stm_output
, FILE*       stm_cr
);
			
Listing 3

Finally, though it’s not shown in this example, I believe it’s appropriate to place the action logic library components in a namespace, since it’s conceivable that the names may clash with those in an automated framework (less likely) or with those of other components when used in other program contexts (more likely). I’ll illustrate this clearly in the next instalment.

Summary

In these two articles I have considered some of the fundamental – important, but not very exiting – aspects of program structure in C and C++ CLI programs, and have outlined in this instalment a delineation scheme that is now sufficient for all CLI programs, even large ones for which multiple (implementation and/or header) files for action logic are required, and may be encapsulated into a framework and/or generated by a wizard. Program generating wizards can follow the separation defined previously, and can, in the same operation, generate automated test client programs that include the action logic header and implementation files.

There’s nothing inherent in the scheme that requires use of CLASP for command-line parsing and Pantheios for diagnostic logging (and Pantheios.Extras.Main and Pantheios.Extras.DiagUtil for handling initialisation, outer-scope exception-handling, and memory-leak tracing); you may substitute your own preferences to suit, and a well-written wizard would be able to allow you to select whatever base libraries you require.

In the next instalment I will introduce a new library, libCLImate, which is a flexible mini-framework for assisting with the boilerplate of any command-line programs and which may be used alone or in concert with program suite-specific libraries to almost completely eliminate all the boring parts of CLI programming in C or C++. Listing 4 is a version of the exemplar pown project’s entry.cpp using libCLImate alone; Listing 5 is the entry.cpp for the pown program in Synesis’ system tools program suite: as you can see, almost every line pertains to the specific program, rather than any common boilerplate. (Having written this tool as an exemplar for this article I realised a few enhancements – adding some behaviour options, splitting into functions eliciting ownership (as strings), and output to streams – would make pown()’s functionality a useful library in several tools, including a new, more powerful standalone pown.)

// includes
#include "pown.hpp"
#include "identity.hpp"
#include <libclimate/libclimate/main.hpp>
// . . . + 3 more

// aliases
extern "C"
clasp::alias_t const CLImate_Aliases[] =
{
  CLASP_FLAG(NULL, "--help",
    "shows this help and terminates"),
  CLASP_FLAG(NULL, "--version",
    "shows version information and terminates"),
  CLASP_ALIAS_ARRAY_TERMINATOR
};

// main / program entry
extern "C++"
int CLImate_program_main(clasp::arguments_t
  const* args)
{
  namespace sscli =
    ::SynesisSoftware::CommandLineInterface;
  if(clasp::flag_specified(args, "--help")) {
    return sscli::show_usage(args,
      CLImate_Aliases, stdout, toolVerMajor, 
        //... + 9 params
  }
  if(clasp::flag_specified(args, "--version")) {
    return sscli::show_version(args,
      CLImate_Aliases, stdout, //. . . + 5 params
  }
  clasp::verify_all_flags_and_options_used(args);
  // process values
  if(0 == args->numValues)
  {
    fprintf(stderr
    , "%s: no paths specified; use --help for
      usage\n"
    , TOOL_NAME
    );
    return EXIT_FAILURE;
  }
  for(size_t i = 0; i != args->numValues; ++i) {
    pown(args->values[i].value.ptr, TOOL_NAME,
      stdout, stderr);
  }
  return EXIT_SUCCESS;
}
			
Listing 4
// includes
#include "pown.hpp"
#include "identity.h"
#include <SynesisSoftware/SystemTools/program_identity_globals.h>
#include <SynesisSoftware/SystemTools/standard_argument_helpers.h>
#include <stlsoft/util/bits/count_functions.h>

using namespace
  ::SynesisSoftware::SystemTools::tools::pown;

// aliases
extern "C"
clasp::alias_t const CLImate_Aliases[] =
{
  // stock
  SS_SYSTOOLS_STD_FLAG_help(),
  SS_SYSTOOLS_STD_FLAG_version(),
  // logic (HELP => elided help-string)
  CLASP_BIT_FLAG("-a", "--show-account",
    POWN_F_SHOW_ACCOUNT, . . . HELP),
  CLASP_BIT_FLAG("-d", "--show-domain",
    POWN_F_SHOW_DOMAIN, . . . HELP),
  CLASP_BIT_FLAG("-r", "--show-file-rel-path",
    POWN_F_SHOW_FILE_REL_PATH, . . . HELP),
  CLASP_BIT_FLAG("-s", "--show-file-stem",
   POWN_F_SHOW_FILE_STEM, ), . . . HELP),
  CLASP_BIT_FLAG("-p", "--show-file-path",
   POWN_F_SHOW_FILE_PATH, . . . HELP),
  CLASP_ALIAS_ARRAY_TERMINATOR
};

// main
int tool_main_inner(clasp::arguments_t 
  const* args)
{
  // process flags & options
  int flags = 0;
  clasp_checkAllFlags(args, SSCLI_aliases,
    &flags);
  clasp::verify_all_options_used(args);
  // can specify at most one file-path flag
  if(stlsoft::count_bits(flags &
    POWN_F_SHOW_FILE_MASK_) > 1) {
    fprintf(stderr
    , "%s: cannot specify more than one file-path
      flag; use --help for usage\n"
    , systemtoolToolName
    );
    return EXIT_FAILURE;
  }
  // process values
  switch(args->numValues)
  {
    case 0:
      fprintf(stderr
      , "%s: no paths specified; use --help for
        usage\n"
      , systemtoolToolName
      );
      return EXIT_FAILURE;
    case 1:
      break;
    default:
      if(0 == (POWN_F_SHOW_FILE_MASK_ & flags)) {
        flags |= POWN_F_SHOW_FILE_REL_PATH;
      }
      break;
  }
  for(size_t i = 0; i != args->numValues; ++i) {
    char const* const path = 
      args->values[i].value.ptr;
    pown(path, flags, systemtoolToolName, stdout,
      stderr);
  }
  return EXIT_SUCCESS;
}
			
Listing 5

In the meantime, I plan to release wizards that generate CLI programs, starting with Visual Studio (2010–15), and possibly moving on to Xcode if I get time. Look out on the Synesis Software website in September for these resources, and feel free to make requests or lend a hand.

Use of FILE* vs ...

Throughout the refactoring and repackaging work undertaken to a swathe of CLI programs, one issue stands out, and it’s the one remaining substantive issue of debate/equivocation:

A: Should the program logic (as library) issue contingent reports?[8]

There are three corollary issues if so:

  1. How should the program logic be provided the process identity?
  2. To where should contingent reports be written?
  3. In what form should contingent reports be written?

If the answer to A is ‘no’, then the library has to return to its caller sufficient information as is required to provide a suitable contingent report. If we consider the pown() function, there are four substantive actions (for a Windows implementation): get the file’s security descriptor; get the security descriptor’s owner SID; lookup the owner SID’s information (specifically account-name and domain-name); and, finally, print out the results in the desired format. Any of these four operations can fail, raising the question of how the caller might wish to represent such failures, and with what detail. Given that we would rarely (if ever) be satisfied with a mere Boolean success/fail in such circumstances, is it useful, to an end-user (of the command-line tool), to know why and what/where the failure occurred, as well as simply that it did? Is this usefulness greater or less when such code is being used (in the nature of one library amongst many) in a larger program?

Assuming we want to know what/where, in addition to why, the means of communicating this to a caller has to be considered (briefly): Is it a more complex return code (perhaps a composite of why and where)? Is it an exception? All such approaches are fraught with leaked coupling and/or loss of information. I’m not going to explore further this aspect, because this is trespassing into the territory I was last exploring in Quality Matters some time ago (and intend to get back to very soon), and because in this context I’m interest in the affirmative option.

One further thing worth considering in the ‘no’ alternative is how to handling warnings, by which I mean contingent reports provided to a/the user but not associated with failing conditions. One example might be in the case of a program’s action logic library that acts on multiple targets (e.g. by specifying a directory and search patterns), and is unable to act on one (or several) matches while still being able to perform its work on the rest of the targets. In such a case, one might expect the program to continue to determine and output (to the standard output stream) the owners for other files, while emitting (to the standard error stream), but if the called library function does not have the ability to issue contingent reports, how is this to be expressed? Perhaps by populating a caller-supplied failed-target list? Whatever the case, such behaviour cannot be handled either by a return code or an exception: the former can't provide enough information; the latter will cause the callee to terminate with its work part-done. In many cases, therefore, I feel that allowing action logic to issue contingent reports can be the pragmatic choice, however much it may sniff of coupling (my personal least liked thing in programming, fwiw).

So, in the cases where the answer to A is ‘yes’, then we must consider the corollary questions outlined above. Again, coupling comes into it. The answer to question 1 is simple and, I think, uncontentious: simply pass in the process identity as a parameter (or as part of a chunk of information passed in a single parameter).

Question 2, concerning where contingent reports should be written, is a little more tricky. In the wild, the action logic is running inside CLI programs, which interact with their three streams – input, output, error –whether they be the streams of the console/terminal, or, via piping/redirection, files and the inputs/outputs of other programs. Whatever the case, the program (and its action logic therein) works with what it believes to be its three standard streams, and I believe it is a valid choice to

What remains to decide is Which?, and In what form? Both decisions are informed by the ‘program design is library design’ philosophy. The answer to Which? is simple: I believe that in order to support testability, the caller should supply separate output-stream (in the case where there is any output by the action logic) and contingent report-stream, as shown in Listing 3. Similarly, if there is a warning stream, that too should be separately specified. The answer to In what form? is a bit more involved.

In C, the three standard streams are represented and used most commonly in the form of the C Streams library globals stdin, stdout, and stderr, each of which is of type FILE*. In C++, the received wisdom is that we should use the C++’s IOStreams library global instances std::cin, std::cout, and std::cerr, each of which is of type std::ostream, and should be used as std::ostream&.

Having documented previously [10, 11, 12] my legion reservations about the IOStreams, I won’t bother to repeat its many flaws here. The main reason I choose to use FILE* forms of the streams in these circumstances is, again, in support of ‘program design is library design’: I believe it’s clear to use NULL (or nullptr, if you prefer) to specify no-stream as a caller, and equally easy to test against NULL in the callee. Secondarily, it results in lower coupling, and allows the callee to be implemented in C (or another language providing a C API) without changing the caller.

Finally, to question 3. This is really a horses-for-courses issue, for which I have no broad answer at this time. In the case of CLI programs that (follow established UNIX behaviour to) issue a contingent report in the fashion of <process-identity>: <problem-details>, it is a simple matter, given that we’ve already accepted the caller-supply of process identity. In the next instalment I’ll look at a more sophisticated means of handling all this, in the form of a program suite-specific contingent reporting mechanism, which simplifies and improves the programming of each program’s action logic at the cost of coupling to the mechanism’s constructs.

Acknowledgements

Many thanks to the members of accu-general who volunteered suggestions for the name of libCLImate, and to Jonathan Wakeley in particular, whose ghastly pun I will explain next time. Thanks too to the long-suffering editor whose patience with my lateness is never taken for granted.

References

[1] ‘Anatomy of a CLI Program written in C’, Matthew Wilson, CVu September 2012.

[2] http://c2.com/cgi/wiki?ExecuteAroundMethod

[3] The Pragmatic Programmer, Dave Thomas and Andy Hunt, Addison-Wesley, 2000

[4] Art of UNIX Programming, Eric Raymond, Addison-Wesley, 2003

[5] http://synesis.com.au/publishing/anatomies

[6] An Introduction to CLASP, part 1: C, Matthew Wilson, CVu January 2012; also http://sourceforge.net/projects/systemtools

[7] http://pantheios.org/

[8] Quality Matters #6: Exceptions for Practically-Unrecoverable Conditions, Matthew Wilson, Overload 98, August 2010

[9] C++ Coding Standards, Herb Sutter and Andrei Alexandrescu, Addison-Wesley, 2004

[10] An Introduction to FastFormat, part 1: The State of the Art, Matthew Wilson, Overload #89, February 2009

[11] An Introduction to FastFormat, part 2: Custom Argument and Sink Types, Matthew Wilson, Overload #90, April 2009

[12] An Introduction to FastFormat, part 3: Solving Real Problems, Quickly, Matthew Wilson, Overload #91, June 2009

Notes: 

More fields may be available via dynamicdata ..