Journal Articles
Browse in : |
All
> Journals
> CVu
> 324
(9)
All > Journal Columns > Code Critique (70) Any of these categories - All of these categories |
Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.
Title: Further Comments on Code Critique 123
Author: Bob Schmidt
Date: 02 September 2020 17:02:39 +01:00 or Wed, 02 September 2020 17:02:39 +01:00
Summary: Additional information has been sent in response to an earlier Code Critique Competition.
Body:
Steven Singer <steven@pertinentdetail.org> writes in with some additional reflections on the last issue’s code critique. For reference, the code is in Listing 1 (pow.c) and Listing 2 (test.c).
/** * @input A : Integer * * @Output Integer */ int isPower(int A) { if(A==1) return 1; else { int i; if(A%2==0) { for(i=2; i<=sqrt(A); i=i+2) if(pow(i, (int)(log(A)/log(i)))==A) return 1; } else { for(i=3; i<=sqrt(A); i=i+2) if(pow(i, (int)(log(A)/log(i)))==A) return 1; } return 0; } } |
Listing 1 |
#include <stdio.h> int main(void) { int A = 1000; printf("isPower(1000) => %i\n", isPower(A)); return 0; } |
Listing 2 |
This is the original problem:
Please find a bug in my code based on finding whether a positive integer can be expressed as pow(a, b) where a and b are non negative integers and b>1. My code outputs for 1000 (103), 0 where it should be 1. Please find the bug…
Steven writes…
There is a significant optimisation that was missed in the answers to code critique #123, and I think it falls into the category of things CVu readers like to read about.
Marcel Marré mentioned something in passing at the end of his critique that could be interpreted as the same optimisation but it was bound up in comments about avoiding floating point. I think the optimisation is useful in its own right, and simple enough to explain if we stick to using floats.
But, before I get there, I learnt a couple of lessons whilst writing code to show the optimisation of interest, and I think those lessons are also worth passing on.
In these examples, I’m not going to worry about the return value for inputs less than one. That was covered adequately in the existing code critiques and I want to emphasise the core of the functions.
To test my code, I built a test framework to run the various code samples so I could benchmark them and check the results. I ran test cases for all the results that should produce a positive result and all the adjacent integers that should produce a negative result for a total of a little over 140,000 test cases.
I’m going to show some code samples. I’ll use the typedef domain
for the integer type used for the input and calculations to distinguish it from the integers used for other purposes (such as return values) and to allow me to try other types (unsigned and/or long integers).
The first thing I learnt was that the compiler I was using would quite happily hoist the sqrt(A)
used as the loop limit out of the loop, but it wouldn’t hoist the log(A)
. That surprised me. I expected it to spot both or neither. I created a basic function with these optimisations that I could use as a timing baseline (see Listing 3).
int simple(domain A) { if (A == 1) return 1; domain root_A = round(sqrt(A)); double log_A = log(A); for(domain i = A % 2 == 0 ? 2 : 3; i <= root_A; i += 2) if (pow(i, round(log_A / log(i))) == A) return 1; return 0; } |
Listing 3 |
Note that the equality comparison between the result of the pow()
and A
is safe as the mantissa of the double has many more bits than the integer being used to hold A
, and we’re giving pow()
representable integer inputs. If the mantissa weren’t big enough, then we’d need to either use a longer floating point type or a whole different strategy. Adding a tolerance is not good enough, if the answer is out by one, then we need to return zero even if our input is 2128−1.
On the random machine I was using (a 2.16 GHz Celeron N2840, so moderately modern but bottom of the range, with GCC 5.4.0 -O3), this took 243 seconds to run all my test cases. Code samples from the other answers using the same strategy had similar performance, varying slightly depending on whether the log(A)
had been hoisted.
The interesting exception was the code from Marcel Marré, which was significantly faster. It took 76 seconds to run the tests. I could get this down to 56 seconds by rewriting the code to avoid long integers as follows:
int marre_no_long(domain A) { if (A == 1) { return 1; } else { domain root_A = round(sqrt(A)); domain a; for(a = A % 2 == 0 ? 2 : 3; a <= root_A; a++) { if (A % a == 0) { domain limit = A / a; domain p = a * a; while(p <= limit) p *= a; if (p == A) return 1; } } return 0; } }
At first this result makes no sense. Marcel’s code may be avoiding floating point operations in the loop but those are fast on modern computers (not as fast as integer, but fast nonetheless). Instead he has a loop which must be expensive.
The answer lies in the pre-test he does. Before going into the long loop, he checks whether the target is a multiple of base that’s been chosen (the line if (A % a == 0)
). Clearly if the target. which is greater than one, is not a multiple of the base, then it can’t be a positive integer power of the base.
This modulo operation is fast. It’s a single integer operation instead of (at least) two floating point operations. Nearly all choices of base fail this test (for example, when the loop index is seven, six out of seven target values will fail the test). That was the second lesson.
That suggests we can optimise the simple code to perform this integer test. It will cost us an extra integer operation in a few cases, but it will save us two floating point operations in most cases (log
and pow
and associated rounding and comparisons). This gives code like this:
int check_mod(domain A) { if (A == 1) return 1; domain root_A = round(sqrt(A)); double log_A = log(A); for(domain i = A % 2 == 0 ? 2 : 3; i <= root_A; i += 2) if (A % i == 0 && pow(i, round(log_A / log(i))) == A) return 1; return 0; }
This reduces the run time to 29 seconds, so eight times faster than the moderately optimised float version and not a significant change in strategy. Not bad for such a small change.
But, we can do a lot better than this, and that involves going back to the problem we’re trying to solve and optimising our algorithm not our code.
The problem we’re trying to solve is given a number, A, find out whether it can be written as one integer to the power of another, xy with y ≥ 2. It’s not possible to solve this directly for both variables. The technique chosen in the solutions so far is to make an exhaustive search of x and then, for each x, look to see if there’s a solution for y. This is fairly easy to do by taking the log to base x of both sides of the equation xy = A to give y = logx A followed by a test to see whether y is an integer.
The trick is to spot that we can do this the other way, we can make a guess at y and find x. We can take the yth root of both sides to give x = y√A and then check whether x is an integer.
As with the original approach, instead of directly testing whether we get an integer result, we calculate y and check we get the original result.
We need to find a limit for the search. In both cases, the limit for the value we’re varying is found by looking at the minimum value of the other variable. So, when we’re varying x, we need to allow for y being two which means that we need to search up to x = √A. When we’re varying y, we need to allow for x being 2 which means we need to search up to y = log2 A. That gives us the following code:
int root_not_base(domain A) { if (A == 1) return 1; int limit = log(A)/log(2) + 0.0001; for(int root = 2; root <= limit; root++) if (pow(round(pow(A, 1.0/root)), root) == A) return 1; return 0; }
This runs through the test cases in just 0.82 seconds which is about three hundred times faster than the original. This is not a bad solution and it’s quite short and therefore readable.
The next obvious question is can we do better. Well, of course we can but we’re going to have to sacrifice a little readability.
For large inputs, we end up searching a lot of roots which will have a result between two and three. That’s because the largest power of two below INT_MAX
is 230 but the largest power of three is 319.
It turns out that, since contemporary computers hold numbers in base two, and can perform bit operations quickly, there’s a neat trick to check if a number is a power of two. If we use that to catch the powers of two, then we can terminate our search at log3A. We have to reject two explicitly as that’s 21 and so doesn’t satisfy our requirements but, on the other hand, the trick gives us the answer we want for one. This code uses a slightly different way of exiting at the limit:
int root_opt(domain A) { if (A == 2) return 0; /* Power of 2 or zero */ if ((A & (A - 1)) == 0) return 1; for(int root = 2; ;root++) { double base = pow(A, 1.0/root); if (pow(round(base), root) == A) return 1; if (base < 3) break; } return 0; }
This has got our run time down to about 0.53 seconds.
Can we do better? Well yes. We’ve not learnt our lesson and once again we’ve fallen into the trap of micro-optimising too early. We need to back to the problem and attack it with some more maths. We can note that if we can find an integer x for some power y, then we can also find a solution for any factor of y. Reversing that, if any of the factors of y doesn’t have an integral solution for x then there’s no solution for y. As an example, one million is 106, but we don’t need to find that out by taking the sixth root. Since six is two times three, there are also the solutions 1003 and 10002. Since we care only whether an answer exists, we can just look at prime values of y:
#define ARRAY_SIZE(x) (sizeof(x)/sizeof(x[0])) int root_primes(domain A) { if (A == 2) return 0; /* Power of 2 or zero */ if ((A & (A - 1)) == 0) return 1; static const double primes[] = { 2, 3, 5, 7, 11, 13, 17, 19 }; /* Enough for uint32 */ for(size_t i = 0; i < ARRAY_SIZE(primes); i++) { double root = primes[i]; double base = pow(A, 1.0/root); if (pow(round(base), root) == A) return 1; if (base < 3) break; } return 0; }
Instead of using floating point operations to try at most twenty-nine roots, we’re only trying at most eight.
This has now got our run time down to about 0.23 seconds, a thousand times faster than the original micro-optimised code (for the mix of values in the test cases I was using).
This is as far as I have got in the optimisation. Perhaps other people can take it further.
We can look at the overall limit on the performance for the different approaches, mostly because no discussion of algorithms is complete without using Big O notation. The original approach was O(√N). This approach is O(π(log N)) where π(x) is the number of primes less than or equal to x. This can be approximated to give an overall order of O(log N/log log N) but that limit becomes relevant only at extremely large N. Using long int
s and increasing the maximum input value to sixty eight trillion needs only one more prime.
For comparison, a binary search of a table, as mentioned by Paul Floyd, is blindingly fast even just using bsearch(), ten milliseconds in my tests, which is probably below accuracy of my very dumb benchmarking. But as Paul correctly noted, it comes with the cost of a lot of memory. The size of the table scales roughly as the square root of the maximum value in the table. To extend it to sixty-eight trillion, increases the size of the table to over eight million entries.
Roger has been programming for over 20 years, most recently in C++ and Java for various investment banks in Canary Wharf and the City. He joined ACCU in 1999 and the BSI C++ panel in 2002.
Notes:
More fields may be available via dynamicdata ..