Title: Comment on “Problem 11”

Author:

Date: 01 February 2004 19:38:02 +00:00 or Sun, 01 February 2004 19:38:02 +00:00

Summary:

The first step here in finding problems in the code is to identify the problem the code is trying to solve. The discussion in the C Vu article is basically about curiosities in the way in which the C++ standard library std::istream is defined, but I will make the perhaps unwarranted assumption that what the problem the code is really about is not the uses of std::istream, but rather, more generally, how to write a read routine that can effectively and safely capture data from an input stream. Actually as the first problem below illustrates neither of these issues can be effectively addressed without the other.

Body:

The proposed improvement to the templated read function is that it starts an approach to handling different input conditions by having the user distinguish between two types of stream ending conditions, reading just an end-of-file and reading a carriage return along with end-of-file. (Do I have this right?)

This is a start, but only useful to illustrate idiosyncrasies of STL istreams. It still has problems with std::istream, but as a lesson in reading computer input it is deficient in the following ways:

The most basic problem here is that of "separation of concerns" and for separate routines that each do one function and do it well. This is particularly unfortunate here, since it is especially important to avoid tight coupling between system support routines (reading input) and client application routines (processing input).

This basic problem is manifest here in multiple ways:
- The client routine is expected to test multiple stream ending conditions, reported with different syntax and in two different domains; one in that of the input mechanism, one in that of the read routine.
- The test for a dummy value is a clever, but is, at best, an awkward and somewhat dubious general approach of detecting particular conditions (should we perhaps label this a hack?).
- Such approaches can easily lead to error prone code.
  
  As implemented here, the two conditions to test are redundant, since a dummy value has to be returned for end-of-file, whether a carriage return was present or not. Thus not only is the client code overly complex, but the strategy is faulty. Also, if the "dummy value" actually happens to be present in the input stream, it will indeed be treated as is any other value.
- Detecting different ending conditions is relevant to the input processing domain; processing different ending condition is relevant to the client domain.
- Testing multiple conditions in multiple ways will not scale well, when other conditions are considered. The example considers a special case, but, with slight extension for instance, the read routine might be adapted easily to process console output directed to a file, where there may be end-of-line, and possibly carriage return characters, separating data items.
The error handling is rigid with no flexibility for adaptation to either the application environment or the client needs.

The read routine throws an exception for stream errors; but even worse the routine buries its own private fgw::bad_input exception. On the other hand, the client routine may well wish to continue processing for bad input, which may be either unreadable for the specified type (input stream domain failure) or invalid (either as defined in the data, the read routine or the client processing domain).
The in.bad() condition is not tested, which is the one more deserving of an exception. Actually for a pre-standard library the fail bit may cover this case. But then, the read routine would throw a bad-data exception, when the error actually is failure to read the data, whether good or bad.
For beginners especially, the code fails to take a valuable opportunity to demonstrate basic and consistent mechanisms for preventing invalid data values from getting past the application external interfaces.
In any case, there needs to be consistent support for applying both general overall application, as well as client routine specific policies for both error handling and for error reporting. Developing those policies is another subject, but the basic interfaces can be made reasonably simple and crucial.
The input data appears to be constructed twice, once in the read routine and once in the client routine, and probably with different constructors. Typically this may not actually be a problem, but this behavior can lead to subtle problems.
If, as suggested here, the client code needs to be abstracted from the details of std::istream error conditions, why have any dependency on std::istream? Perhaps, even more useful than templating the input data type, is abstracting the concept of an input source.
Names are critical. Here the routine does not read the input stream; it reads the next item in the input stream. Hence the routine could be called readNext.
A simple, but important, advantage of abstracting the input source type is that now the function of the routine is not merely readNext, but more generally getNext.

And, we already have a powerful and applicable mechanism in C++ for getNext processing - in the form of iterators, which are applicable here.
The routine is at too low a level for many uses, forcing the client to devise one of many possible iteration constructs. In the face of multiple exit conditions, these are too often error prone.
The routine can only read input of one data type. This is appropriate for "self-defining" streams, which, for instance, provide tokens to identify the next item in the stream. There are numerous other approaches to data type extension, probably well beyond the intent here, but the applicability and limitation should at least be noted.

Solution Steps

The problem issues above can be addressed systematically in a series of steps. These are not all meant for one lesson, but each is straightforward enough, even for beginners. They are all also invaluable in their own rights for other problems. In fact, the process here goes far towards an objective of teaching programming based on principles and practices, rather than just belabouring syntax and semantics.

Provide a status variable parameter, which reports all conditions that the application may or may not want to consider. In its simplest form this a string of bit flags, although supplementary data about the condition may be of interest also. A higher level might introduce predicates, such as status.isValid().
Rather than directly reporting the failure codes particular to a specific source, conditions need to be mapped to categories of concern to the client.

Here, some such conditions might include: invalid parameters (e.g., invalid port or URL), inaccessible input, un-initialized (e.g., un-opened, un-connected ported ) input or un-initializable input (e.g. open or connect failures), insufficient security permissions, source failure, source warning, unreadable data, special delimiter (carriage return, endof-line, white-space, other), invalid data, along with provision for two or three additional conditions to be used for specific implementations.
Allow the interface to set the conditions to abort on, to return to the user, or to just skip over, and the conditions to be reported to the application environment in any case.
Parse all errors reported by the source.
Issues of memory management, references, pointers, multiple constructors - with possibly different behaviour, and data object copying, all rear their awesome heads here as elsewhere. Better, and simpler, is for the client routine to specify where the data is to go.
Use the convention of returning a null, or invalid end(), pointer, rather than attempting to define dummy values. Think of all the fun, the C convention of terminating strings with \0 has caused.
Use a template parameter for the input source type as well as the data type, and introduce template specialization to show std::istream handling. Parameterizing the input source type is important, since it is, or should be, an incidental focus of the application routine. In particular, consistent handling of all input sources is invaluable for an application and makes possible extensions to files, communication protocols, database interfaces, GUIs, and sequences in general.
Represent the source as a forward iterator parameter that wraps either the actual source or an existing iterator. It is useful to illustrate a complete templated iterator solution, but it is only necessary to develop details for the basic template components, and here only for std::istream. The rest can be left for reference to the standard definitions. On the client side, begin and end iterators, for-loops, and dereferencing idioms are simple and natural.
A fundamental extension, is for the template code to test both the input source parameter type and input data parameter type for isValid routines, and use these to check the input data values.
Both the error status conditions and the exception flags are now better included in the iterator template class, rather than the function parameter list.
Have the template code also test the iterator parameter type for an onError interface and report errors to that interface.
Actually there are two parts (handling and reporting) to an onError routine and hence the possibility for two routines:
- The first maps the conditions from a particular environment into the more general client interface. It may also need to set a flag to indicate if resuming input is possible and providing such a mechanism.
- The second, which may be part of the input routine itself, passes information identifying the details to a common higher level application reporting mechanism, for appropriate logging and recording.
A small, but valuable generalization is to look for an input mapping routine in an interface borne by the iterator. This allows data types and values in the input domain to be directly transformed to data types and values in the client domain.
Similarly a filter routine can be used, if present, to bypass unneeded source data.
Illustrate support for to_string and from_string serialization routines, for use with operators << and >> for derived types.
When adapted for output, the iterator can also contain formatting flags and delimiters.
This leads to raising the level of the routine.

Better, for many but not all purposes, would be a copy routine (or move routine, if the input is consumed) following the STL syntax - here, with end() to be set for the iterator return of conditions flagged by the caller. For some applications, which need a lower level involvement in handling special conditions, selected end() conditions can be processed by the client routine, with begin() used to allow an attempt at resumption of input.

And these seventeen progressive steps, I think, provide an outline of a fairly complete solution to the problem of creating a code structure for simply, safely, and effectively transferring input data into an application framework, and by simple extension output data (the homework exercise?). Various interfaces can be made more general and more sophisticated as necessary, without impact on client code. Alternatively, if client code needs to adapt to additional conditions this can be added in a consistent and compatible manner.

The final result, or outline for a result, is considerably more complicated than the initial small example, but there are many valuable pedagogical reasons for developing it. In particular, it should be emphatically taught when not to use code that is error agnostic.

The fundamental lesson here is that there is a considerable difference between production code and code for beginning exercises or prototyping. This is easily spouted as a general principle, but is difficult to teach effectively. The sample problem here provides an ideal basis for illustrating this issue systematically and indicating approaches to dealing with it.

The next most fundamental lesson is to assign responsibilities appropriately, then to design interfaces that handle the responsibilities, and finally to allow flexibility by providing mechanisms to delegate responsibility for policies appropriately. Here there are separate responsibilities in several places:

for the input routine, in being complete in some definable sense,
for the client interface, in specifying a request,
for a higher level routine, in parameterizing the request according to design parameters and constraints,
for the input data class, in maintaining consistency and integrity constraints according to class invariants,
for consistent error handling and reporting policies at the application level, and for flexibility for appropriate interventions by the using client.

Understanding tradeoffs of where and how to apply generality, simplicity, ease of use, and allowance for specific conditions is fundamental. The solution should illustrate use of templates, constructors, default parameters, and environment variables and routines (including exception handlers), as appropriate, to design and apply constraints and policy.

Also fundamental, is the realization that error handling is basic for any significant code that is to actually be employed for useful purposes. By analogy perhaps, with a numerical analysis computation, the result is generally not of value, other than as a guess at usefulness, unless error analysis has been performed to determine how good the result actually is.

One basic tenet about error handling, that emphatically applies here, is that applications need to catch all erroneous inputs at the external interfaces. This can then limit significantly the data validity testing needed later.

Since the student will undoubtedly be exposed to them, the lesson might include tradeoffs in various approaches to error returns through special values, (e.g., end()), through pairs, through bit flags, through status objects, through exceptions, etc. The lesson can emphasize the dangers, particularly for critical application interfaces, of starting with more limited approaches that are inflexible and that do not scale.

The final result may seem more complex than needed for what seems like a simple problem, but I would respectfully disagree with the premise. The problem posed is not trivial; and ignoring basic issues makes for an incomplete solution, not a simple solution. Reasonably simple solutions can still be arrived at by dealing with each issue separately and appropriately.

Techniques

The lessons here are general but the implementations, if they are to be illustrated in C++ code, are admittedly non-trivial. As examples though, the techniques can be easily taught as idioms, to be imitated, and these idioms are also useful in many broader contexts.

From a teaching and learning perspective, there are only two roads to writing useful code in C++. The first is to understand the C++ language and library standard, and particular compiler deviations from it in detail (not particularly to be recommended), The second is by extensive reading and following of useful models (which is what all the worthwhile C++ beginner and intermediate texts provide). Ideally this accomplished with a mentor.

The basic techniques here include:

Basic bit flag masks to indicate status or state; supported by enums that are powers of two, operations on sets of flags, and by status.isXXX() type predicates.
Rudiments of exception handling.
Type generalization through templates, with basic template specialization.
STL iterator concepts, at at least a high level, and their use in general algorithms such as copy.

In particular, a strong preference, if only for consistency, for using STL constructs and concepts where appropriate can be inculcated. For instance, encapsulating iteration (here copy) in a library routine, rather than using a variety of for, while and do constructs is worthwhile.
Parameterization options through template parameters, typedef statements, constructor arguments, default function parameters and environment support (here, at least, exception handlers).
Testing types and objects for extended interfaces through compile time (template based) and run time (dynamic cast) techniques. Here, the solution tries to allow existing data objects and iterators to be used, but takes advantage of additional capabilities if provided.
And yes, idiosyncrasies of various input mechanisms also can be explored.

Perhaps the final lesson is my perception of C++ as a really ugly tool for developing beautiful constructs. As one mentor, once said, "You don't ask a cow why it works the way it does, you just learn to milk it."

The goal of making C++ more accessible to novices is admirable, but oversimplifying the issues does not appear useful; nor does dwelling on details of std::istream to the exclusion of more basic issues.

The discussion above leads to approaches to that goal on two levels:

At the client level, the final copy routine is indeed simple, and can illustrate the power of the tailoring mechanisms to provide a significant range of underlying functionality including: comprehensive handling of unusual conditions, full reporting of error conditions, the ability to adapt to any input source, the ability to map data from different sources to common types, scaling, formats and representations, and the ability to filter extraneous input.
At the development level, the analysis of problems and solutions illustrates both design considerations needed for building code that can adapt to a broad range of application needs, as well as coding considerations in the use of C++ facilities for accomplishing this. This surely is a worthwhile introduction to what programming is all about.

Notes:

More fields may be available via dynamicdata ..