Journal Articles
Browse in : |
All
> Journals
> CVu
> 323
(11)
All > Journal Columns > Code Critique (70) Any of these categories - All of these categories |
Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.
Title: Code Critique Competition 124
Author: Bob Schmidt
Date: 03 July 2020 16:55:56 +01:00 or Fri, 03 July 2020 16:55:56 +01:00
Summary: Set and collated by Roger Orr. A book prize is awarded for the best entry.
Body:
Please note that participation in this competition is open to all members, whether novice or expert. Readers are also encouraged to comment on published entries, and to supply their own possible code samples for the competition (in any common programming language) to scc@accu.org.
Last issue’s code
A rather shorter critique than usual this time, and in C, to give some variety.
Please find a bug in my code based on finding whether a positive integer can be expressed as pow(a, b) where a and b are non negative integers and b>1. My code outputs for 1000 (103), 0 where it should be 1. Please find the bug…
Thanks to Francis Glassborow for passing this example on to me, originally from alt.comp.lang.learn.c-c++. He also points out that this competition is now in its 21st year (he originated it). That means it has been running for longer than most software/developer magazines.
The code is in Listing 1 (pow.c) and Listing 2 (test.c). Can you help?
/** * @input A : Integer * * @Output Integer */ int isPower(int A) { if(A==1) return 1; else { int i; if(A%2==0) { for(i=2; i<=sqrt(A); i=i+2) if(pow(i, (int)(log(A)/log(i)))==A) return 1; } else { for(i=3; i<=sqrt(A); i=i+2) if(pow(i, (int)(log(A)/log(i)))==A) return 1; } return 0; } } |
Listing 1 |
#include <stdio.h> int main(void) { int A = 1000; printf("isPower(1000) => %i\n", isPower(A)); return 0; } |
Listing 2 |
Critiques
Dave Simmonds
First of all lets tidy up a bit.
These issues would be pointed out by any decent compiler – we need to declare the functions we are using.
Listing 2 (Listing 1 in this edition of CVu) needs:
#include <math.h>
Listing 3 (now Listing 2) needs:
extern int isPower(int);
The bug is that when we divide the log
s, we get a double
which we convert to an int
. Rounding here gives us too low a number.
So, if we add 0.5 before rounding down to an int
, we should be good:
if (pow(i, (int)(log(A)/log(i) + 0.5)) == A)
This solves the particular issue as reported.
Two other issue with int
/double
conversions:
pow
returns adouble
and we truncate toint
– that could give us another rounding issue. The easiest solution would be to write a version ofpow
that only deals withint
s.sqrt
returns adouble
, we then compare withint
– would be reasonable to add 0.5 here too.
Other criticisms:
The tabbing – for
, if
, return
all at same tabbing level.
i=i+2
would be better as i+=2
.
We repeat the same loop with a different starting point – let’s just set the starting point appropriately and use the same code.
#include <math.h> static int intpow(int a, int b) { int r = 1; while(b--) r *= a; return r; } int isPower(int A) { if (A == 1) return 1; else { int i = 2; if (A%2) i = 3; for (; i <= (int)(sqrt(A) + 0.5); i+=2) { if (intpow(i, (int)(log(A)/log(i) + 0.5)) == A) return 1; } } return 0; }
James Holland
Computers can only approximate real numbers. Not all numbers can be calculated or represented exactly. This can cause problems when performing mathematical calculations. For example, performing the calculation 1.0 / (0.1 / 0.3) on my machine resulted in 2.9999999999999995559. The result should be 3 exactly.
A similar problem occurs when the supplied program is executed. Part of the program calculates log(A) / log(i)
where A
is 1000 and i
is 10. The result is 2.9999999999999995559 (on my machine) and is not the mathematically correct value of exactly 3. After performing this calculation, the program takes the integer part of the result (which is 2) and raises i
to this value and compares the result with A
. In other words, 102 is not equal to 1000 and so the program erroneously concludes that 1000 is not equal to 103.
One possible solution to this problem is to add a very small amount (such as 0.00000000000001) to log(A) / log(i)
to ensure the integer part of the number is correct. The question now is can adding this small amount cause any other problems. This is difficult to answer and so I leave it to the reader to ponder.
There is quite a bit of duplicated code in the supplied program that can be reduced by writing isPower()
as shown below.
int isPower(int A) { if (A == 1) return 1; else { int i; for (i = A % 2 == 0 ? 2 : 3; i <= sqrt(A); i = i + 2) if (pow(i, (int)(log(A) / log(i) + 0.00000000000001)) == A) return 1; } return 0; }
Also, for clarity, statements in the original code should be indented to show that the return
statement is part of the if
statement and that the if
statement is part of the for
statement, for example.
Hans Vredeveld
By just compiling the code with gcc, the compiler will already inform you of the first problem: all the mathematical functions in pow.c and the function isPower
in test.c are implicitly declared. According to the C90 standard, the compiler should assume a declaration of the form int f();
that is, a function that takes any number of arguments, also of any type, and that returns an int
. When it comes to the mathematical functions, gcc is a bit smarter and knows that, for example, the signature of sqrt
should be double sqrt(double)
and generates object code accordingly. C99 made implicitly declared functions an error, but gcc still treats them as a warning.
To improve the code and to make it C99 compliant, we have to add an #include <math.h>
to pow.c and a declaration int isPower(int)
to test.c (or better, put that declaration in a file pow.h and #include
pow.h to both test.c and pow.c, so the compiler can see that the definition matches the declaration).
One thing to keep in mind when working with floating point numbers is that many numbers cannot be represented exactly, but only approximated. This also goes for the result of calculations. We should therefore consider two floating point numbers equal when their difference is small. What is small depends on the magnitude of the numbers.
Also note that casting a double
to an int
truncates. So (int)2.99999999999999955591 results in 2, not 3.
Given the definition
const double margin = 10 * DBL_EPSILON;
(where DBL_EPSILON
is defined in float.h) and introducing the functions toInt
and isEqual
as
static int toInt(double d) { return d * (1.0 + margin); } static int isEqual(double a, double b) { return fabs(a - b) < ((a + b) * (1.0 + margin)); }
this means the following for the code in isPower
:
i<=sqrt(A)
has to be replaced byi <= toInt(sqrt(A));
(int)(log(A)/log(i))
has to be replaced bytoInt(log(A) / log(i));
pow(…)==A
has to be replaced byisEqual(pow(…), A)
.
These replacements have to be made in both branches of the if(A%2==0)
statement. Looking closer at these branches, we see that the only difference is that in the for
-loop i
is initialised to 2 when A
is even and to 3 when A
is odd. We can remove the if
-statement and one of the branches when we change the initialisation in the remaining for
-statement to i = (A%2 == 0) ? 2 : 3
.
Some other improvements to isPower
are:
in the first if
-statement the else
keyword is not needed; with or without it, the code is only executed when A != 1
;
i
is only used in the for
-statement, and can be declared in the initialisation of the for
-statement instead of before it;
instead of i = i + 2
, write i += 2
, to better signal to a reader that we are only incrementing i
and don’t do something more fancy with it;
properly indent the body of the for
-statement, so it is immediately clear what is executed as part of what;
as an optimisation, calculate toInt(sqrt(A))
once before the loop, instead of in each iteration;
as an optimisation, calculate log(A)
once before the loop, instead of in each iteration.
With all these changes, the complete function now looks as follows:
int isPower(int A) { if (A == 1) return 1; int root_A = toInt(sqrt(A)); double log_A = log(A); for (int i = (A % 2 == 0) ? 2 : 3; i <= root_A; i += 2) if (isEqual(pow(i, toInt(log_A / log(i))), A)) return 1; return 0; }
Finally a note on the doxygen-like documentation. It only says that isPower
takes an integer input named A
and outputs an integer. This can already be inferred from the function signature. A more useful documentation would state what isPower
does, what the restrictions are for the input parameter and what it represents, and what possible values the function can return and what those values mean. A possible improvement of the documentation is
/** * Find whether a positive integer @c A can be * expressed as pow(a, b), where a and b are * non-negative integers and b > 1. * The function returns 1 when such numbers a * and b exist, and 0 otherwise. */
Marcel Marré and Jan Eisenhauer
The main issue with the code is that it mixes the use of floating point numbers with that of integers. For computers, these are two very different things. Often, both need roughly the same memory, but have very different requirements, leading to very different and at times surprising behaviour.
Both in mathematics and computer floating point numbers, there may be more than one way to write a number. For example, 9.99 recurring is actually the same as 10, being 9 + 9/9. If you naively took the integer part of 9.99 recurring, you might say this is 9. And that is what happens in this case.
Generally, to find a bug it is a good idea to narrow down where the error lies either in an interactive debugger or by adding debug output. We added debug outputs displaying log(A)/log(i)
versus int(log(A)/log(i))
. And for the case of A
= 1000 and i
= 10 the float value is displayed as 3.000000, but the integer as 2, showing the exact problem described above.
Also note that the given code, irrespective of rounding errors, yields isPower(2) == 1
, which is incorrect.
One could work with proper rounding or checking whether either ceil(log(A)/log(i))
or floor(log(A)/log(i))
yield the desired result, but while this will work for A
= 1000, this could still go awry for larger numbers. We decided to take an approach using integers exclusively. Rather than using log
this means trying to find a suitable base and exponent. Note that we use a long int
where we expect a possible overflow.
#include <limits.h> #include <math.h> #include <stdio.h> int isPower(int A) { if (A == 1) { return 1; } else { long int a; for (a = 2; a * a <= A; a = a + 1) { if (A % a == 0) { int e = 2; long int p = a * a; while (p < A) { e = e + 1; p = p * a; } if (p == A) { // printf("%i = %i^%i\n", A, a, e); return 1; } } } return 0; } }
We also developed a version that uses binary search to find the right base within a linear search to find the right exponent, which, for the numbers 1 to 1,000,000 was 24x faster than the float
version in addition to being correct. However, we feel this is beyond the scope of this code critique.
Ovidiu Parvu <accu@ovidiuparvu.me>
Before making any (potentially breaking) changes to the isPower()
function unit tests were implemented. The unit tests have been implemented using the [Check](https://libcheck.github.io/check/) unit testing framework for C and are given in the unit_tests.c file included at the end of this submission.
Some of the implemented unit tests failed to run successfully (as expected). Any issues highlighted by the failing unit tests and issues found during code inspection, together with the suggested changes, are described in the Subsections below.
Issue 1: Numerical approximation errors
The main issue with the implementation of the isPower()
function is that numerical approximation errors that can be introduced when representing real numbers using fixed-width floating point variables are not taken into consideration. Specifically when calling isPower(1000)
the following numerical approximation errors occur:
log(1000) = 6.907755278982137 log(10) = 2.302585092994046 log(1000)/log(10) = 2.9999999999999996 // Should be 3.0 (int) log(1000)/log(10) = 2 // Should be 3 pow(10, (int) log(1000)/log(10)) = 100 // Should be 1000
Such numerical approximation errors can be avoided in the isPower()
function by rounding the result of the logarithm values division, namely replacing log(A)/log(i)
expressions with round(log(A)/log(i))
.
Issue 2: Not handling non-positive input values correctly
A second issue is that the code does not explicitly handle non-positive input values correctly. If the isPower()
function is called with a value of zero then it returns an incorrect result of 0 instead of 1 (given that ‘02 = 0’). Also if the isPower()
function is called with a negative value then that negative value is passed into the sqrt()
function, which would result in a domain error (as per https://www.cplusplus.com/reference/cmath/sqrt/).
To avoid this issue, two conditional statements can be used at the beginning of the isPower()
function to handle any input values less than or equal to 1:
int isPower(int A) { if (A < 0) return 0; if (A <= 1) return 1; ... }
Issue 3: Code duplication
The isPower()
function contains two loops which are identical except for the value with which the variable i
is initialized. By extracting the code used to initialize i
outside of the two loops, the two loops can be replaced by a single loop as follows:
int isPower(int A) { ... i = (A % 2 == 0) ? 2 : 3; for (; i <= sqrt(A); i = i + 2) { if (pow(i, (int)(round(log(A) / log(i)))) == A) return 1; } ... }
Other minor issues
Other minor issues that can be addressed are:
- Including the missing
<math.h>
header in pow.c. - Declaring the
isPower()
function in a pow.h header file in order for the function declaration to be visible in test.c (after including pow.h in test.c). - Rewriting the comment corresponding to the
isPower()
function declaration such that it explains what the function does rather than state what type the result and input parameter have, which is already recorded by their corresponding types.
Code listings
The code that includes all the improvements presented above and the corresponding unit tests are given below.
--- pow.h --- /** * Returns: * - 1, if A is a positive number that can be * expressed as A = a^b, * where a > 0 and b > 1. * - 0, otherwise. */ int isPower(int A); --- pow.c --- #include <math.h> #include "pow.h" int isPower(int A) { int i; if (A < 0) return 0; if (A <= 1) return 1; i = (A % 2 == 0) ? 2 : 3; for (; i <= sqrt(A); i += 2) { if (pow(i, (int)(round(log(A) / log(i)))) == A) return 1; } return 0; } --- test.c --- #include <stdio.h> #include "pow.h" int main() { const int A = 1000; printf("isPower(%i) => %i\n", A, isPower(A)); } --- unit_tests.c --- #include <limits.h> #include <math.h> #include <stdlib.h> #include <string.h> #include <check.h> #include "pow.h" int logarithm(int base, int value) { return log(value) / log(base); } START_TEST(test_is_power_negative) { ck_assert_int_eq(isPower(INT_MIN), 0); ck_assert_int_eq(isPower(-4), 0); ck_assert_int_eq(isPower(-1), 0); } END_TEST // [Editor] additional tests for // test_is_power_zero, // test_is_power_one, // test_is_power_positive, // test_is_power_one_thousand, and // test_is_power_non_negative_numbers // elided, to save space Suite *power_suite(void) { Suite *s; TCase *tc_core; s = suite_create("isPower"); tc_core = tcase_create("TestCase"); tcase_add_test(tc_core, test_is_power_negative); // [Editor] likewise for the elided tests suite_add_tcase(s, tc_core); return s; } int main(void) { int number_failed; Suite * s; SRunner *sr; s = power_suite(); sr = srunner_create(s); srunner_run_all(sr, CK_VERBOSE); number_failed = srunner_ntests_failed(sr); srunner_free(sr); return (number_failed == 0) ? 0 : 1; }
Paul Floyd
First, a superficial answer. The reason that this code does not work is floating point rounding. Both pow
and log
take double
s and thus have limited precision. In this case a cast is used
(int)(log(A)/log(i))
which means that the rounding used is truncation. If, due to the limited precision, the result of the log division is a tiny bit less than a full integral value (say 4.99999) then it gets truncated to 4 rather than the ‘correct’ 5 that might be hoped for.
The quick fix would be to use some tolerance. So the pseudo-code would be something like
if (abs(round(log division) - (log division)) < magic tol) { // use round to nearest and test like before } else { // not really a candidate anyway }
Problem solved?
Not really, in my opinion. What should be used for magic tol
? If your name is William Kahan you might be able to work out exactly what the upper bound is for the error in the log division expression. If I were to attempt this then I’d probably guess something around 10 units in the last place (ULP) and then try to test this assumption. My feeling is that it would work reasonably well for int
but would have problems if ever a long int
version were required. The danger is that if you choose a magic tol
that is too lax then close misses might become false positives.
To avoid any issue of floating point precision, I would probably solve this in one of two ways:
- use repeated integer division
- use a reverse lookup table. If you can afford 384 kbytes of memory then you can construct a table of all 48,036 possible candidates for a 32bit int, and then use a binary search. That’s assuming you allow inputs up to
INT_MAX
, but again doesn’t extend well tolong int
.
Usual nits: missing <math.h> header.
Commentary
This critique is quite short, but does demonstrate several problems.
The first problem is the behaviour of a C program when there is no function prototype provided, which occurs when:
isPower()
useslog()
,pow()
, orsqrt()
main()
usesisPower()
The C standard explicitly encourages a warning, but warnings are easy to ignore... and the behaviour of this program is undefined. (C++ solves this problem by not allowing undeclared functions to be called.)
For the case of the functions log()
, pow()
, and sqrt()
that are defined in <math.h>, some compilers (eg. gcc) try to be helpful and implicitly provide the function prototypes while other compilers (eg. MSVC) do not – and so they follows the C rule of assuming the functions return an int
. They also does not know what the argument types are and so perform default argument promotions. The result is almost meaningless as the binary values passed will be misinterpreted by the called function, and the return value will be misinterpreted by the caller!
In the case of isPower()
, since the actual arguments and the implied return type do match the actual function definition, this code is valid – but potentially risky, as if the function signature changes the program could misbehave.
The second problem is the mix of floating point and integer types resulting in a loss of precision. While many integer values will convert to an exact floating point value, calculations done in floating point will often end up very close to the exact value, but not exactly equal.
As the critiques observed, the troubling calculation here is (log(A)/log(i))
as we need the exact integer result from the (int)
cast to give the correct answer, and this is not in general obtained. Fortunately C provides functions that will round a floating point value to the nearest integer, which is exactly what we need here, so a solution using round()
rather than the cast would resolve this.
The third problem is the failure of the algorithm to deal correctly with values less than one. This may not be a problem that needs fixing, depending on the values expected to be encountered in practice, but if not fixed then the preconditions on the values to provide are neither stated nor checked. Defensive programming practices would encourage a check of some kind.
The fourth item I was expecting someone to comment on was the optimisation of only checking even candidates for even numbers and odd candidates for odd numbers. While this is a sensible micro-optimisation I would have liked to see the addition of an explanation – or a meaningful variable name – to help with understanding this code.
The Winner of CC 123
All the critiques identified the problem with the floating point value being truncated to an integer value different from the mathematically correct value.
James, Hans, and Ovidiu all show the numeric approximation (2.990...) that causes the problem in the case presented, which I think is useful as it helps understanding the issue when you see what's actually going on. (Of course, the decimal expansion they each show is in turn only an approximation to the exact value of the floating point number.)
Dave lists the missing <math.h> as the first thing to fix, which is good, although he didn’t explain why it should be fixed. While Hans did recognise that gcc was being helpful in providing an implicit prototype, no-one drew attention to the danger of the missing <math.h> if the code is compiled with a less ‘helpful’ compiler.
Several people eliminated the code duplication of the repeated loops, and either implicitly or explicitly tidied up the poor formatting of the original code.
Hans and Ovidiu both suggest giving some meaning to the ‘doxygen-like’ comment at the heading of the function. This would allow documentation of the preconditions and semantics of the function.
Marcel and Jan helpfully describe the process of narrowing down the error by, in this case, adding debug print statements. This may well be something the writer of the code would find useful both for this problem but also for their future debugging.
I like Ovidiu’s approach of starting by producing some unit tests – this helps to think about a wider range of use cases before prematurely just fixing the presenting case!
I couldn’t reproduce Marcel and Jan’s remark that the original code produced 1 for isPower(2)
; but since the original program contains undefined behaviour this is a possible outcome!
I thought the mention of alternatives such as doing the whole calculation with integers, using a combination of binary and linear search, or using a lookup table, was good; for cases like this where the inputs and the outputs are all integers avoiding floating point may spare a great deal of head scratching. Additionally, Hans provided a couple of caching opportunities that would reduce the cost of the algorithm.
The critiques all provided a lot of illumination of the code and how to improve it. Overall I decided that Ovidiu’s contribution wins the prize, by a short head.
Code Critique 124
(Submissions to scc@accu.org by Aug 1st)
I am trying to write a little program to add up amounts of money and print the total, but it isn’t working as I expected. I realise I need to use a locale but I think there’s something I don’t understand.
I run the program like this:
$ totals en_GB First item: 212.40 Next item: 1,200.00 Next item: ^D
And I get this result:
Total: £2.13
I wanted to get:
Total: £1,412.40
Can you help? “
The code (totals.c) is in Listing 3.
#include <iomanip> #include <iostream> #include <locale> #include <sstream> int main(int, char**argv) { int t; // total std::string ln; std::string m; if (argv[1]) std::cout.imbue(std::locale(argv[1])); std::cout << "First item: "; std::getline(std::cin, ln); std::istringstream(ln) >> std::get_money(m); t = std::stoi(m); std::cout << "Next item: "; while (std::getline(std::cin, ln)) { if (std::istringstream(ln) >> std::get_money(m)) t += std::stoi(m); std::cout << "Next item: "; } std::cout << std::showbase << "Total: " << std::put_money(t) << std::endl; } |
Listing 3 |
You can also get the current problem from the accu-general mail list (next entry is posted around the last issue’s deadline) or from the ACCU website (http://accu.org/index.php/journal).
This particularly helps overseas members who typically get the magazine much later than members in the UK and Europe.
Roger has been programming for over 20 years, most recently in C++ and Java for various investment banks in Canary Wharf and the City. He joined ACCU in 1999 and the BSI C++ panel in 2002.
Notes:
More fields may be available via dynamicdata ..