Browse in : |
All
> Topics
> Programming
All > Journals > CVu > 274 Any of these categories - All of these categories |
Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.
Title: Anatomy of a CLI Program written in C++
Author: Martin Moene
Date: 09 September 2015 07:04:15 +01:00 or Wed, 09 September 2015 07:04:15 +01:00
Summary: Matthew Wilson dissects a small program to examine its gory details.
Body:
This article, the second in a series looking at software anatomy, examines the structure of a small C++ command-line interface (CLI) program in order to highlight what is boilerplate and what is application-specific logic. Based on that analysis, a physical and logical delineation of program contents will be suggested, which represent the basis of the design principles of a new library for assisting in the development of CLI programs in C and C++ to be discussed in detail in the next instalment.
In the first instalment of this series, ‘Anatomy of a CLI Program written in C’ [1], I considered in some depth the different aspects of structure and coupling of a simple but serious C program. The issues examined included: reading from/writing to input/output streams; failure handling; command-line arguments parsing (including standard flags --help
and --version
and application-specific flags); cross-platform compatibility.
The larger issues comprised:
- application of the ExecuteAroundMethod pattern [2] to simplify and make more robust the initialisation of dependency libraries;
- specification of process identity according to DRY SPOT principles [3, 4];
- application of the principle of separation of concerns in the identification and classification of programmer-written CLI code into decision logic, action logic, and support logic. It is the simplification of the first two in, and the elimination of the third from, the task of the programmer that is the aim of this series; and
- decoupling of the action logic from the rest of the application code to facilitate the design philosophy of ‘program design is library design’.
In this second instalment I will consider further these issues, in the context of a small but serious C++ program, with the aim of defining a general CLI application structure that can be applied for all sizes of CLI programs.
Strictly speaking, some of the differences in sophistication and scope between the first instalment and this do not directly reflect the differences between the language C and C++. Rather, they reflect the different levels of complexity that it’s worth considering when deciding in which language to implement a CLI application. I’ll come back to this, and point out some rather important differences, in the third instalment.
DADS separation
Before we start working on the example program, I want to revisit the classification issue. In the first instalment I argued that CLI program code written (or wizard-generated) by the programmer is one of:
- Decision logic – the code that works out what needs to be done and which component(s) will do it;
- Action logic – the code that does the work deemed necessary by the decision-logic; and
- Support logic – all the other stuff, including command-line parsing, diagnostic logging, and so forth.
To this list I now add a fourth:
- Declarative logic – declarations that influence the nature and behaviour of the program, including specifying its identity and the commands to which it responds.
In the examples of both instalments, the clearest example of declarative logic is the ‘aliases’ array that defines what command-line flags and options are understood by the program.
Example program: pown
To avoid destroying too many trees in the production of this month’s issue, I’m going to try and keep the code listings as short as possible by focusing on a small program, albeit one with most real-world concerns; in the printed magazine, these are truncated dramatically, but they’ll be available in full online [5]. For the purposes of pedagogy, I ask you to imagine that we need to write a program to show the owner of one or more files on Windows; in reality this is a feature (/Q) of the built-in dir command.
The features/behaviours of such a program include:
- parse the command-line, either for the standard
--help
or--version
flags, or for the path(s) of the file(s) whose owner(s) should be listed; - properly handle
--
special flag. It’s very easy to simulate the problem with naïve command-line argument handling: just create a file called--help
or--version
(or the name of any other flags/options), and then run the program in that directory with*
(or*.*
on Windows); - expand wildcards on Windows, since its shell does not provide wildcard-expansion before program invocation;
- for each value specified on the command-line, attempt to determine owner and write to standard output stream; if none specified, fail and prompt user;
- provide contingent reports on all failures, including program identity as prefix (according to UNIX de facto standard);
Non-functional behaviour includes:
- use diagnostic logging;
- initialise diagnostic logging library before all other sub-systems (other than language runtime);
- initialise command-line parsing library before all other sub-systems (except diagnostic logging library and language runtime);
- include program identity and version information and include as required in output;
- do not violate DRY SPOT in program identity and version information.
pown.monolith
The first version of this program is done all in one file. Even such a simple program is remarkably large – over 230 non-blank lines – and a big part of its size is boilerplate. The program source has the following sections (parts of which are shown in much truncated form in Listing 1; the full version of this and all are available online):
- includes (18 lines): all required includes, including those required purely for boilerplate aspects, are present in their imposing glory;
- aliases (7 lines): as discussed in the previous instalment, the command-line parsing is handled by the CLASP library [6], and it uses a global alias array constant to specify declaratively which flags and options the program recognises. In the first case, this is
--help
and--version
; - identity (11 lines): this section includes pre-processor (
TOOL_NAME
) and C/C++ global constants (incl.toolVerMajor
,toolVerMajor
, …,toolToolName
, …) specifying identity and version as used by the handlers of the--help
and--version
flags. It also includes a constant required by some of the simple stock front-ends provided with the Pantheios [7] diagnostic logging API library:PANTHEIOS_FE_PROCESS_IDENTITY
; - pown (94 lines): the file owner elicitation & printing logic, in the form of the
pown()
function; - main/program entry (56 lines): this is the application-specific program main entry point,
program_main()
, including checks for the--help
and--version
flags and, if not, processing the given values or, if none specified, informing the user of his/her oversight; - main/boilerplate (51 lines): truly the most boring of the lot, this is just the application of the ExecuteAroundMethod pattern [2] (actually, it should be ExecuteAroundFunction, since these are all free-functions) as follows:
main()
executes Pantheios.Extras.Main’sinvoke()
, to initialise the Pantheios diagnostic logging library (and provide last-gasp outer-scope exception catching with contingent reporting and diagnostic logging [8]) aroundmain_memory_leak_trace_()
,- which executes Pantheios.Extras.DiagUtil’s
invoke()
, to trace memory leaks, aroundmain_cmdline_()
, - which executes CLASP.Main’s
invoke()
aroundprogram_main()
to initialise the CLASP command-line parsing library.
// includes #include <pantheios/pan.hpp> #include <systemtools/clasp/main.hpp> // . . . + Pantheios.Extras, STLSoft, etc. #include <systemtools/clasp/implicit_link.h> // . . . + 4 more impl-link #include "MBldHdr.h" // aliases static clasp::alias_t const Aliases[] = { . . . initialisers same as in Listing 4 // identity #define TOOL_NAME "pown" int const toolVerMajor = __SYV_MAJOR; // . . . + Minor, Revision, BuildNum char const* const toolToolName = TOOL_NAME; // . . . + summary, copyright, description char const* const toolUsage = "USAGE: " TOOL_NAME " { --help | --version | <path-1> [ ... <path-N> ] }"; extern "C" char const PANTHEIOS_FE_PROCESS_IDENTITY[] = "pown"; // pown int pown(char const* path) { . . . 87 lines, retrieve owner (domain . . . & acct), result->stdout; diagnostic . . . logging to trace flow & log failures. // main/program entry int program_main(clasp::arguments_t const* args) { // process flags and options if(clasp::flag_specified(args, "--help")){ . . . 17 lines to initialise CLASP . . . usage structure, invoke usage . . . and return EXIT_SUCCESS if(clasp::flag_specified(args, "--version")){ . . . 9 lines similar to "--help" clasp::verify_all_flags_and_options_used(args); // process values . . . rest of main() same as in Listing 4 } // main/boilerplate . . . 3 x ExecuteAround (see text) |
Listing 1 |
It’s pretty clear that boilerplate is eating space, not to mention effort. Furthermore, structuring source in such a manner is an imposition on programmer visibility (and, I would suggest, happiness).
Note the inclusion of the MBldHdr.h header and use of the symbols __SYV_MAJOR
, __SYV_MINOR
, etc. (whose names violate the standard’s reservation of symbols, as they contain runs of two or more underscores). These are aspects of an extremely old, but still used, mechanism for controlling module version by an external tool, and I include them only to show how such schemes can be used with the proposed anatomical delineation discussed herein.
Separation of concerns – pown.alone
The first obvious thing is to partition the file. This can be done, at least in part, by identifying what parts confirm to the DADS classification. Let’s tackle all the identified sections (apart from includes, which is a necessary evil of C and C++ programming):
- The aliases section is a declarative, and it is entirely about the behaviour of the (command-line) program; it has nothing (direct) to do with the owner-printing logic of
pown()
. - The identity section is a declarative, and it is entirely about the identity of the (command-line) program; it has nothing to do with the owner-printing logic of
pown()
. - The pown section is action logic, and is the entirety of the program-as-library part of the application, in the form of the function
pown()
. - The main/program entry section is a mixture of decision logic, in the form of the tests of the presence of the flags and the presence of (one or more) values, and action logic, in the form of the loop over all present values and execution of
pown()
with each. - The main/boilerplate section is support logic, pure and simple.
Given these designations, the parts may now be separated physically according to the scheme I have been evolving over the last few years, as follows:
- The files pown.hpp and pown.cpp contain, respectively, the declaration and definition of the
pown()
function. pown.hpp is a self-contained header file [9]. - The file entry.cpp contains the aliases, main/program entry, and main/boilerplate sections, and (only) their requisite includes.
- The files identity.hpp and identity.cpp contain, respectively, the declarations and definitions of the global constants identifying the program (and the anachronistic MBldHdr.h + symbols).
- The file diagnostics.cpp contains the definition of the global constant
PANTHEIOS_FE_PROCESS_IDENTITY
only; in more complex programs / program suites, additional diagnostic constructs would reside within such a file. In this way, the actual kinds of diagnostic logging (and other facilities) are separate from all code, allowing for link-time decisions as to what kinds of facilities, and in what configurations, are employed. (Note that Pantheios is a diagnostic logging API library: its high-performance and 100% type-safe interface is designed and intended to be bolted atop the much richer logging libraries out there, which bolting in this compilation unit would be kept nicely separate from the rest of the program.) - In the file implicit_link.cpp all the implicit-link includes are made. This keeps this useful but non-portable compiler-specific facility separate to every other part of the program.
Salient fragments of all the above are presented in Listing 2. Note that, for now:
- pown.hpp is included in entry.cpp, and pown.cpp; and
- identity.hpp is included in diagnostics.cpp, entry.cpp, identity.cpp, and pown.cpp.
// pown.hpp: extern "C" int pown(char const* path); // pown.cpp: . . . #include "pown.hpp" #include "identity.hpp" . . . int pown(char const* path) { . . . // entry.cpp: . . . #include "pown.hpp" #include "identity.hpp" . . . static clasp::alias_t const Aliases[] = { . . . int program_main(clasp::arguments_t const* args) { . . . . . . // other "main"s, including main() // identity.hpp: #define TOOL_NAME "pown" extern int const toolVerMajor; extern int const toolVerMinor; . . . // identity.cpp: #include "identity.hpp" #include "MBldHdr.h" int const toolVerMajor = __SYV_MAJOR; int const toolVerMinor = __SYV_MINOR; . . . char const* const toolToolName = TOOL_NAME; char const* const toolSummary = "Example project for Anatomies article series in CVu"; . . . // diagnostics.cpp: #include "identity.hpp" extern "C" char const PANTHEIOS_FE_PROCESS_IDENTITY[] = TOOL_NAME; // implicit_link.cpp: #include <systemtools/clasp/implicit_link.h> // . . . + 4 more |
Listing 2 |
‘Program Design is Library Design’ – pown.alone.decoupled
In the first instalment, I mentioned the importance I attach to being able, as much as is reasonable, to subject the guts of CLI programs to automated testing. As such, separating out the action logic into pown.[ch]pp is an important step. However, there’s still a problem. Consider the current definition of pown()
(which is that from Listing 1 transplanted into its own source file, with requisite includes): it has three areas of undue coupling:
- it writes its output to
stdout
; - it issues its contingent reports to
stderr
; - it
#includes identity.hpp
because it used the (preprocessor) symbolTOOL_NAME
in its contingent reporting and diagnostic logging statements.
You may offer a fourth area of undue coupling – use of Pantheios C++ API diagnostic statements. The rejoinder I would offer to that is an article in itself, so in this context I will simply observe that diagnostic logging is important, it must reliably be available at all points during the lifetime of a program, it must be very efficient when enabled and have negligible runtime cost when not, it should be near impossible to write defective code using it, any non-standard (and they are all non-standard) diagnostic logging API will incur some coupling however far one might wish to abstract it, and that there is no (possibility of a) perfect solution. (Though I couldn’t be more biased) I believe that Pantheios offers the best mix of features and, since it may be stubbed readily at both compile and link-time, I think it’s as less bad as coupling can get.
To our three areas of undue coupling. The first two are basically the same thing: the output streams are hard-coded into the function, which restricts potential uses of the function. Even if we would always want those output streams in the wild, hard-coding makes automated testing more difficult. The answer is simple – to pass in the stream pointers as parameters to pown()
– though the rationale may be less clear cut (see sidebar).
That just leaves coupling to identity. Fairly obviously, coupling to any preprocessor symbol is not a great idea. (The main reason why TOOL_NAME
is even a preprocessor symbol is to facilitate widestring builds, which I'm not dealing with in this instalment; the other, minor one, is that it can be used in composing string fragments, as seen in the definition of the literal string initialiser for toolUsage
in Listing 1.) The fix here is just as simple as with the streams: a parameter to the function, as shown in Listing 3.
#include <stdio.h> int pown( char const* path , char const* program_name , FILE* stm_output , FILE* stm_cr ); |
Listing 3 |
Finally, though it’s not shown in this example, I believe it’s appropriate to place the action logic library components in a namespace, since it’s conceivable that the names may clash with those in an automated framework (less likely) or with those of other components when used in other program contexts (more likely). I’ll illustrate this clearly in the next instalment.
Summary
In these two articles I have considered some of the fundamental – important, but not very exiting – aspects of program structure in C and C++ CLI programs, and have outlined in this instalment a delineation scheme that is now sufficient for all CLI programs, even large ones for which multiple (implementation and/or header) files for action logic are required, and may be encapsulated into a framework and/or generated by a wizard. Program generating wizards can follow the separation defined previously, and can, in the same operation, generate automated test client programs that include the action logic header and implementation files.
There’s nothing inherent in the scheme that requires use of CLASP for command-line parsing and Pantheios for diagnostic logging (and Pantheios.Extras.Main and Pantheios.Extras.DiagUtil for handling initialisation, outer-scope exception-handling, and memory-leak tracing); you may substitute your own preferences to suit, and a well-written wizard would be able to allow you to select whatever base libraries you require.
In the next instalment I will introduce a new library, libCLImate, which is a flexible mini-framework for assisting with the boilerplate of any command-line programs and which may be used alone or in concert with program suite-specific libraries to almost completely eliminate all the boring parts of CLI programming in C or C++. Listing 4 is a version of the exemplar pown project’s entry.cpp using libCLImate alone; Listing 5 is the entry.cpp for the pown program in Synesis’ system tools program suite: as you can see, almost every line pertains to the specific program, rather than any common boilerplate. (Having written this tool as an exemplar for this article I realised a few enhancements – adding some behaviour options, splitting into functions eliciting ownership (as strings), and output to streams – would make pown()
’s functionality a useful library in several tools, including a new, more powerful standalone pown.)
// includes #include "pown.hpp" #include "identity.hpp" #include <libclimate/libclimate/main.hpp> // . . . + 3 more // aliases extern "C" clasp::alias_t const CLImate_Aliases[] = { CLASP_FLAG(NULL, "--help", "shows this help and terminates"), CLASP_FLAG(NULL, "--version", "shows version information and terminates"), CLASP_ALIAS_ARRAY_TERMINATOR }; // main / program entry extern "C++" int CLImate_program_main(clasp::arguments_t const* args) { namespace sscli = ::SynesisSoftware::CommandLineInterface; if(clasp::flag_specified(args, "--help")) { return sscli::show_usage(args, CLImate_Aliases, stdout, toolVerMajor, //... + 9 params } if(clasp::flag_specified(args, "--version")) { return sscli::show_version(args, CLImate_Aliases, stdout, //. . . + 5 params } clasp::verify_all_flags_and_options_used(args); // process values if(0 == args->numValues) { fprintf(stderr , "%s: no paths specified; use --help for usage\n" , TOOL_NAME ); return EXIT_FAILURE; } for(size_t i = 0; i != args->numValues; ++i) { pown(args->values[i].value.ptr, TOOL_NAME, stdout, stderr); } return EXIT_SUCCESS; } |
Listing 4 |
// includes #include "pown.hpp" #include "identity.h" #include <SynesisSoftware/SystemTools/program_identity_globals.h> #include <SynesisSoftware/SystemTools/standard_argument_helpers.h> #include <stlsoft/util/bits/count_functions.h> using namespace ::SynesisSoftware::SystemTools::tools::pown; // aliases extern "C" clasp::alias_t const CLImate_Aliases[] = { // stock SS_SYSTOOLS_STD_FLAG_help(), SS_SYSTOOLS_STD_FLAG_version(), // logic (HELP => elided help-string) CLASP_BIT_FLAG("-a", "--show-account", POWN_F_SHOW_ACCOUNT, . . . HELP), CLASP_BIT_FLAG("-d", "--show-domain", POWN_F_SHOW_DOMAIN, . . . HELP), CLASP_BIT_FLAG("-r", "--show-file-rel-path", POWN_F_SHOW_FILE_REL_PATH, . . . HELP), CLASP_BIT_FLAG("-s", "--show-file-stem", POWN_F_SHOW_FILE_STEM, ), . . . HELP), CLASP_BIT_FLAG("-p", "--show-file-path", POWN_F_SHOW_FILE_PATH, . . . HELP), CLASP_ALIAS_ARRAY_TERMINATOR }; // main int tool_main_inner(clasp::arguments_t const* args) { // process flags & options int flags = 0; clasp_checkAllFlags(args, SSCLI_aliases, &flags); clasp::verify_all_options_used(args); // can specify at most one file-path flag if(stlsoft::count_bits(flags & POWN_F_SHOW_FILE_MASK_) > 1) { fprintf(stderr , "%s: cannot specify more than one file-path flag; use --help for usage\n" , systemtoolToolName ); return EXIT_FAILURE; } // process values switch(args->numValues) { case 0: fprintf(stderr , "%s: no paths specified; use --help for usage\n" , systemtoolToolName ); return EXIT_FAILURE; case 1: break; default: if(0 == (POWN_F_SHOW_FILE_MASK_ & flags)) { flags |= POWN_F_SHOW_FILE_REL_PATH; } break; } for(size_t i = 0; i != args->numValues; ++i) { char const* const path = args->values[i].value.ptr; pown(path, flags, systemtoolToolName, stdout, stderr); } return EXIT_SUCCESS; } |
Listing 5 |
In the meantime, I plan to release wizards that generate CLI programs, starting with Visual Studio (2010–15), and possibly moving on to Xcode if I get time. Look out on the Synesis Software website in September for these resources, and feel free to make requests or lend a hand.
Use of FILE* vs ... | |
|
Acknowledgements
Many thanks to the members of accu-general who volunteered suggestions for the name of libCLImate, and to Jonathan Wakeley in particular, whose ghastly pun I will explain next time. Thanks too to the long-suffering editor whose patience with my lateness is never taken for granted.
References
[1] ‘Anatomy of a CLI Program written in C’, Matthew Wilson, CVu September 2012.
[2] http://c2.com/cgi/wiki?ExecuteAroundMethod
[3] The Pragmatic Programmer, Dave Thomas and Andy Hunt, Addison-Wesley, 2000
[4] Art of UNIX Programming, Eric Raymond, Addison-Wesley, 2003
[5] http://synesis.com.au/publishing/anatomies
[6] An Introduction to CLASP, part 1: C, Matthew Wilson, CVu January 2012; also http://sourceforge.net/projects/systemtools
[8] Quality Matters #6: Exceptions for Practically-Unrecoverable Conditions, Matthew Wilson, Overload 98, August 2010
[9] C++ Coding Standards, Herb Sutter and Andrei Alexandrescu, Addison-Wesley, 2004
[10] An Introduction to FastFormat, part 1: The State of the Art, Matthew Wilson, Overload #89, February 2009
[11] An Introduction to FastFormat, part 2: Custom Argument and Sink Types, Matthew Wilson, Overload #90, April 2009
[12] An Introduction to FastFormat, part 3: Solving Real Problems, Quickly, Matthew Wilson, Overload #91, June 2009
Notes:
More fields may be available via dynamicdata ..