Journal Articles
Browse in : |
All
> Journals
> CVu
> 324
(9)
All > Journal Columns > Code Critique (70) Any of these categories - All of these categories |
Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.
Title: Code Critique Competition 125
Author: Bob Schmidt
Date: 03 September 2020 18:12:07 +01:00 or Thu, 03 September 2020 18:12:07 +01:00
Summary: Set and collated by Roger Orr. A book prize is awarded for the best entry.
Body:
Please note that participation in this competition is open to all members, whether novice or expert. Readers are also encouraged to comment on published entries, and to supply their own possible code samples for the competition (in any common programming language) to scc@accu.org.
Last issue’s code
I am trying to write a little program to add up amounts of money and print the total, but it isn’t working as I expected. I realise I need to use a locale but I think there’s something I don’t understand.
I run the program like this:
$ totals en_GB First item: 212.40 Next item: 1,200.00 Next item: ^D
And I get this result:
Total: £2.13
I wanted to get:
Total: £1,412.40
Can you help? “
The code (totals.c) is in Listing 1.
#include <iomanip> #include <iostream> #include <locale> #include <sstream> int main(int, char**argv) { int t; // total std::string ln; std::string m; if (argv[1]) std::cout.imbue(std::locale(argv[1])); std::cout << "First item: "; std::getline(std::cin, ln); std::istringstream(ln) >> std::get_money(m); t = std::stoi(m); std::cout << "Next item: "; while (std::getline(std::cin, ln)) { if (std::istringstream(ln) >> std::get_money(m)) t += std::stoi(m); std::cout << "Next item: "; } std::cout << std::showbase << "Total: " << std::put_money(t) << std::endl; } |
Listing 1 |
Critiques
Dave Simmonds
At first glance, it looks like the issue is t
being an int
. put_money
is supposed to work with long double
or basic_string
.
But, the int
will be promoted to a long double
, and our string money variables will convert to int
ok, so this isn’t actually an issue.
As long as we don’t overflow the int
.
The issue is that we’ve imbued std::cout
with the locale, but we haven’t imbued the istringstream
we use for get_money
.
Another issue is that we are not checking whether we have valid input on the first get_money
– if not we will throw an exception.
We should check it (and let the operator know if there is an issue).
So, we need to change the first get_money
from:
std::istringstream(ln) >> std::get_money(m); t = std::stoi(m);
to something like:
auto iss = std::istringstream(ln); if (argv[1]) iss.imbue(std::locale(argv[1])); if (iss >> std::get_money(m)) t = std::stoi(m); else { std::cout << "bad input" << std::endl; t = 0; }
And the second one from:
if (std::istringstream(ln) >> std::get_money(m)) t += std::stoi(m);
to:
iss = std::istringstream(ln); if (argv[1]) iss.imbue(std::locale(argv[1])); if (iss >> std::get_money(m)) t += std::stoi(m); else std::cout << "bad input" << std::endl;
The program will then work as intended.
Other issues:
If we do not supply a locale, we will be working only with integers. Is that what is intended? Should we insist that a locale is supplied?
Is en_GB
correct? Maybe we want en_GB.UTF-8
– depending on how our terminal emulation works. en_GB
will try to print a pound sign as char 163 – en_GB.UTF-8
will print as 194, 163. If we are using putty in UTF-8 mode, which is common, we want en_GB.UTF-8
.
This one is pretty obscure, but apparently it’s possible for argc
to be zero and argv
to be NULL
. It’s allowed by the standard, although I’ve never seen it. So, just in case, we should define argc
(change int main(int, char**argv)
to int main(int argc, char**argv)
) and change if (argv[1])
(in a few places) to if (argc > 1)
.
Hans Vredeveld
The OP is right that she needs to use locales. She correctly set the locale on std::cout
to format the total properly, but forgot to set the locale on the temporaries std::istringstream(ln)
to read the input amounts properly. One way to do this is to set the locale on the temporary streams before an amount is read from it (this would imply that those temporaries have to be named). For an application like this, that has to use the same locale everywhere it uses a locale, it is worth considering to set the locale globally with a call to:
std::locale::global(std::locale("argv[1]"))
at the start of the program. Any stream that is now created is automatically imbued with our locale. We still have to imbue std::cout
with the locale, as it was created before we set the global locale.
Now that we have a correctly working program, let’s see what else we can improve.
The first thing is if (argv[1])
. In all the cases that I know of, this happens to work, but that is not guaranteed by the standard. argv
is an array of char *
of length argc+1
, where argv[argc]
is a null pointer. If argc
is at least 1
, argv[0]
is a string representing the program’s name and any following entries are the program’s arguments. Although I have never seen it, argc
can be 0
and argv[1]
would reference memory past the end of the array. For the second and later program arguments, this way of testing whether the command line argument exists, references memory past the end of the array when the program is called without any arguments. A robust implementation will check whether argc
has a large enough value before it references an entry in argv
. In our case that would be: if (argc > 0)
.
Next, the code reads only one amount per line, and only if that amount is the first thing on the line. This may or may not be what you want. If it should be possible to enter more than one amount per line, we could replace
std::getline(std::cin, ln); std::istringstream(ln) >> std::get_money(m);
with
std::cin >> std::get_money(m);
(don’t forget to imbue std::cin
with the locale).
Now, let’s assume that we want only one amount per line and that it is the first item on the line. Between the code for the first item and next items, there is some code duplication with a twist. If there is no first line, or if the line does not contain an amount, the program will fail with an exception. If, at one point, there is no next line to read, the program will stop parsing input and show the results. If a next line does not contain an amount, the program will silently skip the line. Another difference is that when reading the first item, we initialize the variable t
, and when reading the next items, we add to it. We avoid some code duplication, and make the program more robust, when outputting the text “First item: †is immediately followed by the while
-loop. Of course, then we also have to initialize t
to 0 in its declaration.
Speaking of t
, it is a non-descriptive name that needs a comment at its declaration. That comment gives us an excellent name for the variable: total
.
The next variable that is declared is ln
. Like t
, this is rather non-descriptive and I like to replace it with the more descriptive name line
. We also note that its type is std::string
, but that the header <string>
is not included directly, and that we rely on it being included via one of the other headers. This makes the code fragile, so let’s #include <string>
directly. Lastly, line
, as it’s now called, is only used in the while-loop, so it would be nice if we can restrict its scope to that loop. We can do that if we change the while-statement to a for-statement:
for (std::string line; std::getline(std::cin, line);)
The last variable is m
. We rename it to money
right away. Note that it is declared at the top of the function, but that it is only used in the if
-statement inside the for
-loop (former while
-loop). Using C++17, we can use the init-statement for if
to declare it right there. In the if
-statement, we read an amount into a string and then convert that string to a number that is added to the total. We can also read an amount into a long double
instead of an std::string
. That would remove the need for the conversion from std::string
to int
. Putting it all together, the if
-statement now becomes
if (long double money; std::istringstream(line) >> std::get_money(money)) total += money;
To avoid implicit conversions, we also have to make total
a long double
. This also solves an issue with the use of the template std::put_money
. When total
was an int
, the template argument was int
, but the standard only allows for long double
or a specialization of std::basic_string
.
Finally some words on std::get_money
. Playing with it for this code critique, I ran into some unexpected behaviour, at least for me.
First, the input stream must have the amount in the correct number of decimals. With the British locale (en_GB
), “1200.00†and “1,200.00†will be processed correctly as £ 1,200.00. Unfortunately, “1200â€, “1,200†and “1,200.0†will all be consumed from the input stream, but won’t result in an amount being put in the argument to std::get_money
. On the other hand, if the input stream contains “1,200.004â€, the amount will be £ 1,200.00 and the stream still contains the character ‘4’.
Another surprise for me was the thousands separator. Being Dutch, I’m used to the thousands separator being a point (and the decimal separator a comma). With the locale set to nl_NL
, I input “1.200,00†and got out € 0,00 instead of € 1.200,00. Instead, I had to use a space as thousands separator (which was introduced as thousands separator to solve the ambiguity with the point and comma) and input “1 200,00†(or “1200,00â€). Going a little to the east, to Germany (locale de_DE
), I still have to use a point as thousands separator and can’t use the space. (If you think this isn’t messy enough, go to Belgium. With the locale nl_BE
, the thousands separator is a space, and with fr_BE
, it is a point.) Of course, the results are dependent on the locale database on my computer. Results can differ with a different locale database.
In the end, I advise to not use std::get_money
for anything but a throw-away program or for when you have full control over the input (although, in that case you can probably read the amount directly as an int
or double
without problem). In all cases, test it with the locales that you plan to use, and on the machines that you want to use it on.
On the other hand, std::put_money
can be useful to quickly get the output formatted in the way that your user expects.
James Holland
The first thing that struck me is the legality of the statement if (argv[1])
. What is the value of argv[1]
and is the expression always valid? The student does not make use of the first parameter of main()
, usually called argc
. The C++ standard states that “The value of argv [argc]
shall be 0
â€. The standard does not say anything about argv[x]
when x
is greater than argc
. If argc
is ever 0
then accessing argv[1]
would represent undefined behaviour. Can argc
ever be zero? Running the following code on a Linux system, where totals
is an executable (containing main()
), results in argc
of totals
being zero.
char * arguments[] = {nullptr}; execv("./totals", arguments);
So, yes, it can be done. If there is a possibility of argc
being zero it should be guarded against. I suggest making sure that argc
is equal to 2 before imbuing std:cout
.
if (argc == 2) std::cout.imbue(std::locale(argv[1]));
Now we come to the name of the locale as provided by the program argument. There is only one locale name that is standard, specifically, C
. There is no guarantee that the one supplied will be recognised. On my system en_GB
is not recognised and results in imbue()
throwing an exception of type std::runtime_error
.
One way to discover the user’s locale of a particular system is to execute the following statement.
std::locale("").name();
On my machine, this returns "en_GB.UTF-8"
. Using this string as the program argument is accepted by imbue()
.
My compiler failed to compile the statement
std::istringstream(ln) >> std::get_money(m);
although the same version of the compiler on Compiler Explorer (clang 9.0.0) compiled without error. I never discovered why. I had to separate the statement into two as shown below.
std::istringstream iss(ln); iss >> std::get_money(m);
I had to give the if
statement within the while loop similar treatment. The supplied code now compiles, runs and gives the problematic output described by the student.
It turns out that the wrong stream is being imbued with the locale. Imbuing std::cout
has no effect because a complete line of text is being read from std::cout
and placed in an std::string
, no formatting is involved. The contents of the std::string
is then transferred to an input string stream. This input string stream should be imbued with the locale. When this is done, the program works as desired.
Generally, it is better to leave the definition of variables as late as possible. The variable t
, for example, can be defined and given a value in one statement, namely int t = std::stoi(m);
. The definition of variables ln
and m
can also be moved down the code to just before they are used.
The readability of the supplied code would be improved considerably if more expressive variable names were used. Names such as ln
, m
and t
convey very little information. I suggest names such as line
, money
and total
would be more appropriate. This would remove the need for a comment in the supplied code explaining what t
meant, for example.
We are encouraged not to use std::endl
anymore. std::endl
performs two functions; it writes '\n'
to the stream and then flushes the stream. There are very few occasions where it is required to flush the stream explicitly. All the programmer needs to do in this case is write the new line character to the stream. This shows exactly what is being written to the stream and leaves the flushing to the compiler/operating system. This is a minor point, however, and will make little or no difference to the performance of the student’s program.
The student’s code can be simplified to some extent. Instead of transferring the contents of std::cin
to an std::istringstream
via an std::string
and then imbuing the std::istringstream
, std::cin
can be imbued directly. This results in the simplified program shown below.
#include <iomanip> #include <iostream> #include <locale> int main(int argc, char**argv) { if (argc != 2) { std::cout << "Please provide a locale\n"; return 0; } std::cin.imbue(std::locale(argv[1])); std::cout.imbue(std::locale(argv[1])); int total = 0; std::cout << "First item: "; for (std::string money; std::cin >> get_money(money);) { total += std::stoi(money); std::cout << "Next item: "; } std::cout << std::showbase << "Total: " << std::put_money(total) << ‘\n’; }
Incidentally, it is curious that the student’s program is named totals.c rather than totals.cpp. The program is definitely written in C++.
Ovidiu Parvu
We will start by describing why the total computed by the program is incorrect, and then we will continue to explore other improvements that can be made to avoid undefined behaviour, make the program more maintainable and improve the user experience.
Computing the total correctly
The reason why the total computed by the program is incorrect is that the locale provided as input to the program is not used when reading data from the standard input. Therefore, in most cases, the amounts used by the program to compute the total are different from the amounts input by the user.
For instance, the incorrect amounts used by the program when the user inputs the values from the problem description are given in the comments below:
std::istringstream("212.40") >> std::get_money(m); std::cout << std::showbase << std::put_money(std::stoi(m)); // £2.12 std::istringstream("1,200.00") >> std::get_money(m); std::cout << std::put_money(std::stoi(m)); // £.01
The program can be fixed by using the locale when reading the amounts of money, specifically by calling the imbue()
function on the std::istringstreams
before they are used to read the values input by the user, as shown below.
std::istringstream firstItemStream("212.40"); firstItemStream.imbue(std::locale("en_GB")); firstItemStream >> std::get_money(m); std::cout << std::showbase << std::put_money(std::stoi(m)); // £212.40 std::istringstream secondItemStream( "1,200.00"); secondItemStream.imbue(std::locale("en_GB")); secondItemStream >> std::get_money(m); std::cout << std::put_money(std::stoi(m)); // £1200.00
Improving code used to read amounts of money
There are several changes that can be made to the code used to read amounts of money in order to make it more maintainable and improve the user experience.
First of all the code for reading amounts of money is duplicated, namely it is defined once for reading the first amount and once for reading the next amounts. Instead the code could be extracted into a function and therefore be defined only once.
Secondly the std::istringstream
objects that are used to read the amounts of money are constructed using a copy of the line of input read from the standard input. Since C++20 it is possible to move-construct std::istringstream
objects using a std::string
rvalue reference. However, an alternative approach, which we will use here, that is not restricted to C++20 is to stop using std::istringstream
objects and read the input directly from std::cin
. Note that this means that the imbue()
function should be called on std::cin
in order for the locale chosen by the user to be employed. Also the <sstream>
header include can be removed.
Thirdly if the value input by the user is an invalid amount of money then the std::stoi(m)
method call will throw an exception which is not caught, the program therefore crashes and the total so far is not printed to the standard output. If I were a user of this program the last thing I would want is for the program to crash when I am entering the tenth amount incorrectly after inputting the previous nine amounts correctly, therefore losing all of the correctly input values. For a better user experience whenever an incorrect amount is entered by the user the program could output the total so far, print a useful error message and exit gracefully. The total so far could then be used as the first amount when re-running the program which means none of the amounts correctly entered previously would need to be re-entered.
Fourthly for each line of input, the amount of money is read from the beginning of the line and any remaining input is silently ignored. For transparency purposes the program should print a message whenever any user input is ignored.
Finally the names of variables m
and t
do not indicate what these variables represent. Providing details about the semantics of variable t
in a comment next to the variable declaration does not address the naming issue because it requires future maintainers of the code to jump from the place in the code where t
is used to the variable declaration in order to find the comment which describes what the variable represents. A better solution would be to use a more descriptive variable name, for both variables m
and t
. For instance, m
and t
could be renamed to newAmount
and total
, respectively.
All of the improvements described in this section have been implemented in the tryGetNewAmount()
function which is called from the main()
function as shown below.
std::optional<int> tryGetNewAmount( std::istream& in) { std::string newAmount; in >> std::get_money(newAmount); if (in.fail()) { std::cerr << "Ignoring invalid amount " "of money...\n"; return {}; } std::string extraInput; std::getline(std::cin, extraInput); if (!extraInput.empty()) { std::cerr << "Ignoring the extra input: " << extraInput << '\n'; } return std::stoi(newAmount); } int main(int, char** argv) { ... if (argv[1]) { const std::locale locale(argv[1]); std::cout.imbue(locale); std::cin.imbue(locale); } std::cout << "First item: "; int total = tryGetNewAmount(std::cin).value_or(0); while (std::cin) { std::cout << "Next item: "; total += tryGetNewAmount(std::cin).value_or(0); } std::cout << std::showbase << "Total: " << std::put_money(total) << std::endl; }
Avoiding undefined behaviour
The approach used to verify if the locale was specified on the command line is to check if argv[1]
is not null. Using this approach can lead to undefined behaviour in a certain scenario.
Specifically if the number of arguments arg`
is zero then the value of argv[1]
is undefined. As it stands, the program will access the value of argv[1]
even when argc
is zero. Note that the C++(17) standard specifies in Section 6.6.1, paragraph 2 that:
The value of argc shall be non-negative
(i.e.
argc
can be zero).If
argc
is nonzero, these arguments shall be supplied inargv[0]
throughargv[argc-1]
as pointers to the initial characters of null-terminated multibyte stringswhere by “these arguments†the standard refers to the “arguments passed to the program from the environment in which the program is runâ€.
The value of argv[argc] shall be 0.
To the best of my knowledge, the C++ standard does not specify what the value of argv[argc + 1]
should be.
An example of a scenario in which argc
would be zero is if the program used to compute the total amount of money, called totals
, would be executed using an exec
-type function to which an array of nullptr
command line arguments was provided as input, an example of which is given below.
#include <unistd.h> int main() { char* args[] = {nullptr}; execv("./totals", args); }
In order to eliminate the opportunity for undefined behaviour the value of argc
should be used to determine if the locale was provided as input on the command line.
In addition the program should check if the locale specified by the user is recognized by the operating system, and if it is not, then it should not crash due to an unhandled exception thrown by the std::locale
constructor. Instead a user-friendly message should be printed and the program should exit gracefully.
The updated code used to handle the locale is given below.
std::optional<std::locale> tryGetLocale( char* cmdLineArg) { try { return std::locale(cmdLineArg); } catch (const std::runtime_error&) { return {}; } } int main(int argc, char** argv) { if (argc != 2) { std::cerr << "The program should be " "executed using a single command-line " "argument representing the locale " " and it was not. " "Exiting...\n"; return 1; } std::optional<std::locale> locale = tryGetLocale(argv[1]); if (!locale) { std::cerr << "The operating system does " "not have a locale named: " << argv[1] << ". Exiting...\n"; return 2; } std::cout.imbue(locale.value()); std::cin.imbue(locale.value()); ... }
Other minor improvements
Additional minor improvements that could be implemented are:
- The totals.c file contains C++ code, not C code. Therefore the file extension should be changed from .c to .cpp.
std::strings
are used but the<string>
header is not explicitly included. It should be.- There does not seem to be a reason for flushing when printing the total at the end of the
main()
function. Thereforestd::endl
could be replaced with'\n'
.
Conclusions
In conclusion the recommended improvements are to:
- Use the locale when reading the values input by the user;
- Refactor the code used to read amounts of money;
- Improve the usablity of the program by catching exceptions, printing useful error messages and exiting gracefully;
- Eliminate undefined behaviour by using
argc
to check the number of arguments provided on the command line.
For reproducibility purposes, the updated source code containing all the recommended improvements is given below.
--- totals.cpp --- #include <iomanip> #include <iostream> #include <locale> #include <optional> #include <string> std::optional<std::locale> tryGetLocale( char* cmdLineArg) { try { return std::locale(cmdLineArg); } catch (const std::runtime_error&) { return {}; } } std::optional<int> tryGetNewAmount( std::istream& in) { std::string newAmount; in >> std::get_money(newAmount); if (in.fail()) { std::cerr << "Ignoring invalid amount of" " money...\n"; return {}; } std::string extraInput; std::getline(std::cin, extraInput); if (!extraInput.empty()) { std::cerr << "Ignoring the extra input: " << extraInput << '\n'; } return std::stoi(newAmount); } int main(int argc, char** argv) { if (argc != 2) { std::cerr << "The program should be " "executed using a single command-line " "argument representing the locale " "and it was not. " "Exiting...\n"; return 1; } std::optional<std::locale> locale = tryGetLocale(argv[1]); if (!locale) { std::cerr << "The operating system does " "not have a locale named: " << argv[1] << ". Exiting...\n"; return 2; } std::cout.imbue(locale.value()); std::cin.imbue(locale.value()); std::cout << "First item: "; int total = tryGetNewAmount(std::cin).value_or(0); while (std::cin) { std::cout << "Next item: "; total += tryGetNewAmount(std::cin).value_or(0); } std::cout << std::showbase << "Total: " << std::put_money(total) << '\n'; }
Commentary
This critique is a simple attempt to use get_money()
and put_money()
. As Hans in particular points out, there are a number of issues with how the money handling in C++ works, especially when changing locale, and it does appear to hard to use the facility for the input of data whose formatting is not in your control.
It is also a little strange that get_money()
specializations are defined only for specializations of std::basic_string
and for long double
, and it also seems unfortunate that there is very little flexibility in the format of the input string. I share the conclusion that get_money()
should probably generally be avoided.
I think between them the entries covered pretty well everything I might have said, so I’ve got nothing else to add.
The winner of CC 124
All the critiques correctly identified that the root error was the failure to imbue
the desired locale into the std::istringstream
objects used for the get_money()
calls. A couple of people suggested simplifying the program to read std::cin
directly, rather than via an ancillary std::string
. It would, however, be a good idea to check before changing the code as it’s not clear which parts of the behaviour the current program should be retained.
Again, several critiques noticed the duplication of the code to read the amount of money from the input and suggested ways to avoid this. Creating a helper function, such as Ovidiu’s tryGetNewAmount()
, is a useful lesson in code design for the original author.
Hans realized that, since the numeric value of the input amount was required, using the overload of get_money()
that takes a long double
avoids the need in the original code to convert the string into a number (by calling std::stoi()
.)
Locale handling is troublesome in C++ because there is little standardization in the set of supported locales. Hence it is usually a good idea important to anticipate problems and provide helpful diagnostics to assist the user of the program in diagnosis of failure.
As Dave points out, it is by no means clear the en_GB
used in the problem description is the best choice, even for a British English user.
There were other pieces of advice given to the writer of the code that should prove useful – such as using good names for variables, and reducing the scope of variables when possible.
While there were several good entries this time round, I think that Hans’ critique was (just!) the best one for I have awarded him the prize for this critique.
As always, thank you to all those who participated!
Code Critique 125
(Submissions to scc@accu.org by Oct 1st)
I was reading about approximating π by working out the area under the quarter circle by dividing it into thinner and thinner slices, and treating each slice as a quadrilateral, and was curious about to know rapidly the approximation approached the actual value. So, I wrote a program where you provide the accuracy (or accuracies) you want and it tells you how many slices you need to get this close to the right value. However, when I run it, I noticed I always seemed to get a power of 2 for the number of slices – and I don’t think this can be correct. What have I done wrong?
The code is in Listing 2.
#define _USE_MATH_DEFINES // for MSVC #include <cmath> using std::abs, std::pow, std::sqrt; // Area of the quadrilateral inside this slice of // the quadrant from [x, x+width) double slice(double x, double width) { double left, right; // By Pythagorus left = sqrt(1 - pow(x, 2)); x += width; right = sqrt(1 - pow(x, 2)); return (left + right) / 2 * width; } #include <iostream> #include <string> int main(int argc, char **argv) { int slices; double accuracy, deltax, pi, quad, x; if (argc < 2) { std::cout << "missing argument(s)"; return 1; } for (int arg = 1; arg < argc; arg++) { accuracy = std::stod(argv[arg]); slices = 1; for (;;) { deltax = 1.0 / slices; quad = 0; for (x = 0.0; x < 1.0; x += deltax) { quad += slice(x, deltax); } pi = quad + 4; if (abs(M_PI - pi) < accuracy) break; slices++; } std::cout << slices << "=>" << pi << '\n'; } } |
Listing x |
You can also get the current problem from the accu-general mail list (next entry is posted around the last issue’s deadline) or from the ACCU website (http://accu.org/index.php/journal).
This particularly helps overseas members who typically get the magazine much later than members in the UK and Europe.
Roger has been programming for over 20 years, most recently in C++ and Java for various investment banks in Canary Wharf and the City. He joined ACCU in 1999 and the BSI C++ panel in 2002.
Notes:
More fields may be available via dynamicdata ..