Journal Articles

CVu Journal Vol 30, #4 - September 2018 + Programming Topics

Browse in :

All > Journals > CVu > 304 (10)
All > Topics > Programming (877)
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: On Francisâ€™s Challenge #4

Author: Bob Schmidt

Date: 07 September 2018 18:40:12 +01:00 or Fri, 07 September 2018 18:40:12 +01:00

Summary: Andreas Gieriet presents his solution (ab-)using expression- and jump-statements.

Body:

This is the winning entry to Francisâ€™s Challenge #4, presented in CVu 30.3. Both Francis and Steve felt this entry deserved an article of its own.

I like this kind of challenge! Especially Francisâ€™s sentence â€œI cannot think of a way to do this in C other than...â€ triggered my curiosity. It asks for lateral thinking and invites me to explore the (sometimes dark) corners of the languages C and C++.

For the fun of it, I allow myself to do rather dirty but â€˜legalâ€™ hacks here and there. The shown approaches are not exhaustive â€“ there are for sure many more possible solutions.

As a disclaimer: I do not suggest that any of the shown code snippets are good practice nor that they should be used in productive code. The snippets show what is possible, not what is desirable.

The challenge

To the challenge: write a (C/C++) program that prints integral numbers from two run-time boundaries â€˜firstâ€™ to â€˜lastâ€™ (inclusive). The only constraints are not to use anything classed as either an iteration_statement (while, do-while, for) or a selection_statement (if, switch). There are explicitly no other constraints (so, use that freedom!).

The tools

What language and library tools do we have at hand to circumvent the constraints?

language: all but the selection_statement and iteration_statement. E.g. C/C++: labeled_statement, expression_statement, jump_statement; C++: try_block, etc.
See [2] and [3] and Figure 1, which shows C++ statements.

Note: for the purpose of simplicity, I refer to C++ syntax descriptions, knowing that C is in most regards a subset of those, i.e. for the scope of this text, C is considered to be covered by these syntax references).
standard libraries: all libraries, types and functions. E.g. C/C++: assert, exit; C++: std::for_each, etc.
See [2] and [3].
compiler generated implicit code: all. E.g. C++: constructors, etc.

Figure 1

Letâ€™s dive into the world of missing iteration_statements and selection_statements!

We will talk about short-circuit evaluation and the comma operator to mimic if, about goto to produce some iteration, about various recursions as replacement for iterations, touch lookup tables and function factories, and finally get into some C++ approaches like range iterator and benefiting from constructors.

All this is mingled with code snippets, standard references and some more exotic ideas like assert to terminate a program and longjmp to mimic iteration.

Letâ€™s start!

Compile-time versus run-time boundaries

If we had compile-time boundaries for printing the range of numbers, we could use template â€˜magicâ€™.

With run-time boundaries, we cannot use any template â€˜magicâ€™ directly AFAIK.

Since we are bound to run-time values, we must first read these two values. No constraint is given if we read these values via command line arguments or via some input stream. Neither is there any error handling required.

BTW: it is not even required that the program terminates or that it terminates gracefully. E.g. it might loop forever once all the numbers are printed, or it might crash when the task has completed.

Conclusion: Template magic is not of any direct use in this challenge.

Short-circuit evaluation

Conditional code execution is needed e.g. to read the values from the command line, or to conditionally terminate the print loop or to terminate recursion, etc.

C/C++ provides the short-circuit evaluation of the logical_and_expression (a && b) as well as of the logical_or_expression (a || b) as well as of the conditional operator (c ? a : b)

See [2], section 5.14 Logical AND operator:

[...] The && operator groups left-to-right. The operands are both contextually converted to bool [...]. The result is true if both operands are true and false otherwise. Unlike &, && guarantees left-to-right evaluation: the second operand is not evaluated if the first operand is false. [...]

See [2], section 5.15 Logical OR operator

[...] The || operator groups left-to-right. The operands are both contextually converted to bool [...]. It returns true if either of its operands is true, and false otherwise. Unlike |, || guarantees left-to-right evaluation; moreover, the second operand is not evaluated if the first operand evaluates to true. [...]

See [2], section 5.16 Conditional operator

[...] Conditional expressions group right-to-left. The first expression is contextually converted to bool [...]. It is evaluated and if it is true, the result of the conditional expression is the value of the second expression, otherwise that of the third expression. Only one of the second and third expressions is evaluated. Every value computation and side effect associated with the first expression is sequenced before every value computation and side effect associated with the second or third expression. [...]

We can take benefit of that well-defined behaviour, e.g.

  argc == 3 || print_usage_and_exit(argv);

where, for example

  bool print_usage_and_exit(const char* argv[])
  {
    printf("usage: %s [first] [last]\n", argv[0]);
    exit(1);
    return true; // not reached but returns boolean
    // for convenience in short-circuit expressions
  }

Note: There is no need to have an if around argc == 3 nor to have the result assigned to any variable. The languages allow for this construct (see Figure 2: expression_statement; Figure 3: expression; Figure 4: assignment_expression).

Figure 2

Figure 3

Figure 4

For details on the exit function, see [2], section 18.5, paragraph 8.

See [2], section 6.2.

[...] Expression statements have the form [...]

expression_opt ;

The expression is a discarded-value expression [...] An expression statement with the expression missing is called a null statement. [...]

And [2], section 5, paragraph 11.

[...] In some contexts, an expression only appears for its side effects. Such an expression is called a discarded-value expression. The expression is evaluated and its value is discarded. [...]

Note: Short-circuit evaluation is useable for expression statements only.

E.g. the following is not possible since return is a jump-statement and not an expression(_statement):

  argc == 3 || return; // won't compile

Note also, that an else or else-if part can be emulated e.g. by the respective not-if condition as prefix for an else part.

Caution: Do not get confused by expression statements which form an assignment. They are still discarded-value expressions (even there is an assignment, e.g. v = 5;). The reason is that an assignment itself yields a value which allows to chain assignments (e.g. a = b = c = 5;). The left-most assignment yields its value which then is discarded.

Conclusion: Short-circuit evaluation can emulate some if constructs as long as the code after the condition is an expression(-statement) itself.

Comma operator

The comma operator is useful to sequence multiple expression-statements or to have non-return functions in a (short-circuit) boolean expression. E.g.

  argc == 3
  || (printf("usage: %s [first] [last]\n, argv[0]),
  exit(1), true);

See [2], section 5.18 Comma operator.

[...] A pair of expressions separated by a comma is evaluated left-to-right; the left expression is a discarded-value expression [...]. Every value computation and side effect associated with the left expression is sequenced before every value computation and side effect associated with the right expression. The type and value of the result are the type and value of the right operand [...]

Conclusion: The comma operator is useful in connection with the short-circuit evaluation under the constraints that no if nor switch is allowed.

Parse arguments by short-circuit evaluation and comma-operator

A possible C solution to parse command line arguments would be as shown in Listing 1. (C++ was similar but with C++-ish includes, io, bool, and namespaces.)

#include <stdio.h>
#include <stdlib.h>
void print_usage_and_exit(const char* argv[])
{
  printf("usage: %s [first] [last]\n", argv[0]);
  exit(1);
}
int parse_int(int argc, const char* argv[],
  int pos)
{
  argc > pos && pos > 0 ||
    (print_usage_and_exit(argv), 1);
  char* next = 0;
  int value = strtol(argv[pos], &next, 10);
  next && !*next || (print_usage_and_exit(argv),
    1);
  return value;
}
int main(int argc, const char* argv[])
{
  int first = parse_int(argc, argv, 1);
  int last = parse_int(argc, argv, 2);
  // more to come here
}

Listing 1

Loop forever by label and goto

Loop â€˜foreverâ€™ is the basic iterative control flow for our challenge. Since no iteration_statement is allowed, we have to divert to other means.

One is a label-statement (label:...) and one of the jump-statements (goto label;). E.g.

  int first = ...
  int last = ...
loop:
  first <= last || (exit(0), 1);
  //or for C++ with (..., true) instead of (..., 1)
   printf("%d\n", first);
  first < last || (exit(0), 1);
  //needed for the case where last == INT_MAX
  first++;
  goto loop;

Note: The unconstrained version with a while-loop (or any other iteration_statement) would also need special handling of the last == INT_MAX case, i.e. you must not increment beyond INT_MAX. (E.g. see Listing 2.)

int i = first;
while (i < last)
{
  printf("%d\n", i);
  i++;
}
if (i == last)
{
  printf("%d\n", i);
}

Listing 2

Conclusion: The label/goto pair form a robust and quite straight forward solution for the given challenge.

Exotic: How about assert(...) to let evaluate the stop condition?

Instead of the short-circuit evaluation, we could also use the assert function to terminate the program. This is not so graceful a termination but it does not violate the constraints, e.g.

    int first = ...
    int last = ...
  loop:
    assert(first <= last);     // terminates the
          // program by a crash if the condition
          // is not satisfied
    printf("%d\n", first);
    assert(first < last);
      // covers last == INT_MAX
    first++;
    goto loop;

Note: This usually only works in Debug mode. See also [2], section 19.3 Assertions, as well as [3], section 7.2 Diagnostics <assert.h>.

Conclusion: Fulfils the constraints, not so nice termination.

Exotic: How about to not terminate the program at all?

We could also let the program run forever and print the range and then do nothing other than looping idle, e.g.

    int first = ...
    int last = ...

    int verbose = 1;
  loop:
    verbose = verbose && (first <= last);
    verbose && printf("%d\n", first);
    first < last || (verbose = 0);
      // INT_MAX safe end handling
    first++;
    goto loop;

Conclusion: Not user friendly solution since it (silently) does not terminate at all.

Simple recursion

Recursion is the alternative to iteration. In its simplest form, it requires a stop condition and a conditional call to itself with modified parameters. E.g.

  int print_range_recursive(int current, int last)
  {
    current <= last && printf("%d\n", current);
    current <  last &&
      print_range_recursive(current+1, last);
    return 1;
  }

Note: Even there are no local variables, the return address must still go on the stack. This means that for large ranges, you probably run out of memory or run into a stack overflow condition.

Note: the stack size is assumed to be linearly proportionate to the range, e.g. a range of 10 to the power of 9 results in a stack usage proportional to 10 to the power of 9.

Conclusion: Not robust for large ranges.

Tail recursion

Tail recursion is the ability of the compiler to not call the function recursively under certain conditions but rather perform an implicit goto to the function label and only return from the function on its stop condition.

The construct is usually as follows (assuming the compiler is capable to detect this and act accordingly):

  void f(args)
  {
    // return if stop...
    // ...do some stuff...
    f(args+1); // must be last statement:
               // that's where the term "tail
               //   recursion" comes from
  }

The trouble here: how to do a conditional return which is not the last statement (under the given challengeâ€™s constraints: no if).

In C, I see no decent solution other than to exit the program at that point, e.g.

  void print_range_tail_recursive(int current,
    int last)
  {
    current <= last || (exit(0), 1);
    printf("%d\n", current);
    current < last || (exit(0), 1);
    print_range_tail_recursive(current+1, last);
      // assuming the compiler optimizes
      // this to tail recursion
}

In C++ we could use a try-catch block, like Listing 3.

void print_range_tail_recursive_cpp(int current,
  int last)
{
  try
  {
    current <= last || (throw 0, true);
    // or give a more decent exception name/value

    printf("%d\n", current);
    current < last || (throw 0, true);
    // safe end handling
  }
  catch(...)
  {
    return;
  }
  print_range_tail_recursive_cpp(current+1,
    last);
// assuming the compiler optimizes this to
// tail recursion
}

Listing 3

Note: If the compiler applies tail recursion, this solves the resource usage of the simple recursion solution above.

Conclusion: If the compiler supports and actually applies here the tail recursion, stack usage is constant, i.e. independent of the range.

Balanced recursion

The idea is to replace the simple recursion by cutting the range into two halves in each recursion and then traverse each half recursively in the same way. This results (in absence of tail recursion) in maximum stack usage proportional to log(range)/log(2). E.g. for a range of 10 to the power of 9, the maximum function call depth would be about 30 (compared to 10 to the power of 9). For example, see Listing 4.

int print_range_balanced_by_two_recursive(
  int current, int last)
{
  int cut = (current + last) / 2; // Troublesome!
                              // See notes below.
  current >  last // nop
  || current == last && printf("%d\n", current)
  || print_range_balanced_by_two_recursive
    (current, cut)
  && print_range_balanced_by_two_recursive
    (cut + 1, last)
  ;
  return 1;
}

Listing 4

Note: the term (current + last)/2 might get out of range, e.g. (INT_MAX + INT_MAX)/2 has an intermediate result of twice INT_MAX!.

This can be avoided by a bit more elaborate expressions like the following below: each term never exceeds the max range and neither does the overall sum.

  int cut = current/2 + last/2 +
    (current%2 + last%2)/2;

But other trouble lies ahead: This only works for positive limits (x/2 rounds towards 0 while x >> 1 rounds towards INT_MIN; e.g. -3/2 = -1 while -3 >> 1 = -2). The correct solution is to use arithmetic right-shift, since this simply cuts off one bit on the right and preserves the sign. So, a version which also works for negative limits and respects full number range is:

  int cut = (current>>1) + (last>>1) +
    (((current&1) + (last&1))>>1);

But unfortunately: the right-shift operator is â€˜implementation definedâ€™ if working on signed negative values! See [2], section 5.8 Shift operators, paragraph 3:

[...] The value of E1 >> E2 is E1 right-shifted E2 bit positions. [...] If E1 has a signed type and a negative value, the resulting value is implementation-defined. [...]

Life could be so easy... On most machines this will work. Sometimes you have to make a decision to still use â€˜implementation definedâ€™ features of a language and to make it safe, add assertions to document your assumptions. E.g.

  // implementation defined: arithmetic shift with
  // signed negative values see C++ standard
  // (http://www.open-std.org/jtc1/sc22/wg21/)
  // section 5.8 Shift operators, paragraph 3
  assert(-2 >> 1 == -1);
  assert(-3 >> 1 == -2);
  assert(-4 >> 1 == -2);

Conclusion: Robust and (almost) compiler-independent solution which heavily reduces stack usage.

State lookup table

The idea is to have a state (comparison of first versus last value) which chooses between actions. The first simple version was to have the conditions map into a table index.

In C, this works out-of-the-box (the condition evaluates to 0 or 1, which can be used as index into the table).

Caution: you must make sure that the condition expression yields an index of 0 and 1 (and not 0 and != 0).

In C++ (and C) you might translate by the ternary operator from the condition (which leads to a boolean value) to the index. E.g.

  first <= last ? 0 : 1

An example of a condition to index translation is in Listing 5.

void end(int* current, int last)
{
  exit(0);
}
void print_and_increment(int* current, int last)
{
  printf("%d\n", *current);
  (*current)++;
}
typedef void (*action_t)(int* current, int last);
action_t action[] =
{
  print_and_increment,
  end,
};
...
 first = ...
 last = ...

loop:
  action[first <= last ? 0 : 1](&first, last);
  goto loop;

Listing 5

If you want to do more elaborate tables, e.g. with safe end handling, you can do something like Listing 6.

void end(int* current, int last)
{
  exit(0);
}
void print_and_end(int* current, int last)
{
  printf("%d\n", *current);
  end(current, last);
}
void print_and_increment(int* current, int last)
{
  printf("%d\n", *current);
  (*current)++;
}
typedef void (*action_t)(int* current, int last);
action_t action[] =
{
  print_and_increment,
  print_and_end,
  end,
};
...
  first = ...
  last = ...
loop:
  action[first < last ? 0 : first == last ? 1
    : 2](&first, last);
  goto loop;

Listing 6

Function factory

A similar approach to the above was to have a function returning an action based on the condition. (See Listing 7.)

// ...elided the function definitions from above
// example...
typedef void (*action_t)(int* current, int last);
action_t get_action(current, last)
{
  action_t action = end;
  current < last && (action = print_and_increment);
  current == last && (action = print_and_end);
  return action;
}
...
  first = ...
  last = ...
loop:
  get_action(first, last)(&first, last);
  goto loop;

Listing 7

C++ range iterator

An iterator mimics pointer logic (++p, p != end, *p) to allow the traversing of containers e.g. by means of std::foreach. The standard libraries currently do not provide any range class (AFAIK). So, we can build our own and still comply with the constraints of the given challenge.

Special care is taken again for safe end handling: an isEnd_ flag is needed since otherwise incrementing beyond the last (e.g. INT_MAX value) might break the logic. See Listing 8.

#include <iostream>
#include <algorithm>
#include <cstdlib>
using namespace std;
class Range
{
public:
  class It
  {
    private:
      friend class Range;
      It(int first, int last) : current_{first},
        last_{last}, isEnd_{first > last}
      { }
      It(int last)
      : current_{last}, last_{last}, isEnd_{true}
      { }
    public:
      It& operator++()
      {
        isEnd_ = isEnd_ || current_ == last_;
        isEnd_ || ++current_;
        return *this;
      }
      int operator*() const { return current_; }
      bool operator==(It const& other) const
      {
        return isEnd_ &&  other.isEnd_
          || !isEnd_ && !other.isEnd_
          && last_ == other.last_ && current_
          == other.current_;
      }
      bool operator!=(It const& other) const
      { return !operator==(other); }
    private:
      int current_;
      int last_;
      bool isEnd_;
  };
  Range(int first, int last): begin_{first,
    last}, end_{last} {}
  It const& begin() const { return begin_; }
  It const& end()   const { return end_; }

private:
  It const begin_;
  It const end_;
};
...
  int first = ...
  int last = ...
  Range range{first,last};
  for_each(range.begin(), range.end(),
    [](int const& n) { cout << n << endl; });
    ...

Listing 8

Note: Some non-standard libraries like boost provide ranges, though.

Conclusion: Provides iterators on a range which mimics a container without actual elements.

Constructors of an array

Let the constructor print the items. See Listing 9.

...
int first = 0;
int last = 0;

struct Item
{
  Item()
  {
     static bool verbose = true;
       // safe end handling...
     verbose = verbose && first <= last;
     verbose && printf("%d\n", first);
     first < last && (first++, true)
       || (verbose = false);
  }
};
int main(int argc, const char* argv[])
{
  first = parse_int(argc, argv, 1);
  last = parse_int(argc, argv, 2);
  first <= last
    || (print_usage_and_exit(argv),  1);
  auto data = new Item[last-first+1];
}

Listing 9

Conclusion: Not so robust since inadvertently constructed Items might disturb the output.

Exotic: longjmp

The standard libraries provide the capability to jump around beyond the capabilities of pure label/goto pairs.

See the comments in the code below to get the basics of longjmp. See also [3], section 7.13 Nonlocal jumps <setjmp.h>.

Note: I elided safe end handling here since it is complex enough with the setjmp/longjmp code anyway. See Listing 10.

#include <limits.h> // see [3], section 5.2.4.2.1
                    // Sizes of integer types <limits.h>
...
int main(int argc, const char* argv[])
{
  // if not volatile, the value might be cached
  // in a register
  volatile int first = parse_int(argc, argv, 1);
  int last = parse_int(argc, argv, 2);
  // INT_MAX not supported in this code snippet
  last < INT_MAX
    || (print_usage_and_exit(argv), 1);
  // opaque variable for the context where to
  // jump back (like a label)
  jmp_buf jmp_ctx;
  // 0 = init,
  // else came back from "longjmp(..., 1)"
  // this mimics the label
  int jmp_return = setjmp(jmp_ctx);
  // do action (e.g. "printf") if came from
  // "longjmp(..., 1)" since !=0
  jmp_return && printf("%d\n", first++);
  // if first <= last, goto "setjmp" location
  first <= last && (longjmp(jmp_ctx, 1), 1);
}

Listing 10

Note: Use of this in C++ is has some constraints regarding object construction/destruction. See [2], section 18.10, paragraph 4.

Conclusion: Really not the way to do it, I mean, it works, but itâ€™s â€˜goto squaredâ€™! ☺

Epilogue

Working on the challenge was fun! Iâ€™m convinced that there are many more ways than the ones described above. E.g. an esoteric one might be to use bsearch for ranges smaller than 31 values...

I mainly focussed on C implementations. I wonder what (other) C++ solutions for the given challenge are around!

Thanks to Francis again for triggering my synapses.

References

For those who want to get into more details, the following sources might be of some use:

[1] https://www.onlinegdb.com/online_c_compilerOnline C/C++ compiler which allows for the ad hoc try out of code in various languages and dialects without the need to install any tool chain.

[2] http://www.open-std.org/jtc1/sc22/wg21/Lists among other documents the latest publicly available C++ standard draft (e.g. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3797.pdf).

[3] http://www.open-std.org/jtc1/sc22/wg14/www/standards.htmlLists among other documents the latest publicly available C standard draft (e.g. http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf).

[4] http://www.externsoft.ch/media/swf/cpp11-iso.htmlHtml linked syntax productions which I compiled several years ago for C++ 1998 and 2011 variants.

[5] http://dotnet.jku.at/applications/Visualizer/Nice EBNF visualization tool. I used it to generate the figures in this text.

Andreas Gieriet Andreas Gieriet holds an MSc in electrical engineering, and freelances in several roles in the SW engineering food chain. He has a passion for lecturing and still gets pleasure from â€œfinding things outâ€ after many years. He can be contacted at andreas.giriert@externsoft.ch.

Notes:

More fields may be available via dynamicdata ..