Journal Articles
Browse in : |
All
> Journals
> CVu
> 275
(10)
All > Topics > Programming (877) Any of these categories - All of these categories |
Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.
Title: Code Critique Competition 96
Author: Martin Moene
Date: 04 November 2015 09:11:52 +00:00 or Wed, 04 November 2015 09:11:52 +00:00
Summary: Set and collated by Roger Orr. A book prize is awarded for the best entry.
Body:
Participation in this competition is open to all members, whether novice or expert. Readers are also encouraged to comment on published entries, and to supply their own possible code samples for the competition (in any common programming language) to scc@accu.org.
Note: If you would rather not have your critique visible online, please inform me. (We will remove email addresses!)
Last issue’s code
I have some C code that tries to build up a character array using printf
calls but I’m not getting the output I expect. I’ve extracted a simpler program from my real program to show the problem.
With one compiler I get "Rog"
and with another I get "lburp@"
.
I’m expecting to see:
"Roger: 10
Bill: 5
Wilbur: 12"
What have I done wrong?
Can you give some advice to help this programmer?
The code is in Listing 1.
#include <stdio.h> #define ARRAY_SZ(x) sizeof(x)/sizeof(x[0]) typedef struct _Score { char *name; int score; } Score; void to_string(Score *scores, size_t n, char *buffer, size_t len) { for (size_t i = 0; i < n; i++) { size_t printed = snprintf(buffer, len, "%s:\t%u\n", scores[i].name, scores[i].score); buffer += printed; len -= printed; } } void process(char buffer[]) { Score sc[] = { { "Roger", 10 }, { "Bill", 5 }, { "Wilbur", 12 }, }; to_string(sc, ARRAY_SZ(sc), buffer, ARRAY_SZ(buffer)); } int main() { char buffer[100]; process(buffer); printf(buffer); } |
Listing 1 |
Critiques
Mathias Gaunard
The main problem of this snippet is the ARRAY_SZ
macro, meant to compute the size of an array. This macro will accept pointers as input but provide the wrong answer, in this case sizeof(char*)/sizeof(char)
, which is the word size, 4 bytes for 32-bit systems. This explains why the result is "Rog"
on some systems; only 4 bytes were written, 3 characters plus the null byte.
With C++, it is possible to write a function that provides the same functionality but that will lead to errors whenever pointers are passed, by passing the array by reference and using templates to deduce its size:
template<class T, size_t N> size_t array_sz(T(&)[N]) { return N; }
By using this function instead of ARRAY_SZ
, we can easily isolate the errors. One way to fix this is to modify process
to also take the array by reference.
template<size_t N> void process(char (&buffer)[N]);
This will make the code work and display the expected result, however, it still has an issue with how it handles the case where the buffer is not large enough to hold all score listings. As we saw earlier, the code incorrectly claimed that the buffer was word-size-sized, but it should still have always given "Rog"
as a result, and never "lburp@"
. snprintf
can return a negative value on failure, and returns the number of characters it would have written if the buffer is not big enough. It is easy to trigger erroneous output like "lbur"
simply by hardcoding printed
to (size_t)-1
.
It is therefore necessary to do
printed = min(printed, len)
to avoid writing past the end or before the start of the buffer. Alternatively, one could use dynamically-sized buffers to not impose arbitrary limits, which can be done easily by using C++ iostreams.
Other miscellaneous issues:
_Score
is a name reserved for the implementation, and should not be used. All names starting with an underscore followed by an uppercase letters are reserved. By using C++, the typedef/struct becomes redundant anyway, so one should just usestruct Score
.- The type of the
name
member inScore
should beconst char*
rather thanchar*
, since it is initialized from string literals, which areconst
in C++, and shouldn’t be modified regardless since it is undefined behaviour to do so. - The first argument to
printf
should be a format, it is therefore potentially dangerous to callprintf(buffer)
directly in case we ever chose to put special characters in the score entries. Instead, one should writeprintf("%s", buffer);
.
Peter Sommerlad
First of all, if a C program misbehaves, I suggest compiling it with a C++ compiler.
Sidenote: Well, that is what I would do and also I would highly recommend to no longer use C at all. With the exception of very, very small targets in the embedded area (8 bit controllers), decent C++ implementations should be available and C should no longer be taught to students. It just lacks any means of abstraction and is too hard to use correctly and even if it is, it is too easy to break things by ‘maintenance’.
Now to the problem at hand. Our IDE Cevelop and my C++ compiler tells me several problematic points, including the culprit of the underlying issue.
Let us start our journey in main()
:
int main() { char buffer[100];
The array line is something I wouldn’t have done in C++ at all and I was recommending of not allocating arrays on the stack in C at all, if you can not control the size written. Code like that is behind most security vulnerabilities in the past decades. Cevelop (set to parse as C++14) will claim
- Found C-Array: {0}
- Un- or ill-initialized variable found
The first note is that we try to rewrite code using plain arrays to use the better and copyable alternative std::array
instead (or std::string
). The second comes from our C++11 checker that looks for variables not initialized with the initializer list syntax and transforms them. Even in C code this might be a good idea, since the memory of buffer can contain arbitrary values (only by luck some bytes are 0 if we use C-style string handling in the general case).
Before we apply the quick fixes let us go further to the code of main()
.
process(buffer);
Here an array is passed as a function argument and our student should know that any array passed as such degenerates to a pointer to the first element and the dimension of the array is lost. This is still true in C++ and therefore I recommend to never use a plain array as a function parameter or as an argument. It just works without any safeguards and having a programmer deal with array bounds checking explicitly is very error prone and again leads to another big share of software vulnerabilities.
printf(buffer);
Even the innocent looking last line is problematic. The printf
function takes at least one char
pointer argument, but interprets its content. Since we will not know what content our process function is storing in the buffer array, we do not know if printf
would be expecting more arguments, because buffer contains any special printf
formatting symbols, such as "%s"
, for example. This could lead to leaking unintended information or in the better case to program crashes, because of illegal memory addresses accessed. It is a similar problem as with the XKCD comic (https://xkcd.com/327/) with printf
syntax instead of SQL.
A better C function to output a plain C string would be
puts(buffer);
instead, which at least just expects buffer to be NUL
terminated.
Wow, a full screen full of text with just 3 lines of code. Roger selected a good example this time!
Now let us look one step down in the call chain to process:
void process(char buffer[])
as said above, this array parameter is equivalent to writing it as a pointer parameter
void process(char *buffer)
and that will make the real error below (stay tuned) much more evident.
{ Score sc[] = { { "Roger", 10 }, { "Bill", 5 }, { "Wilbur", 12 }, };
The lines above again triggers two messages, one from our plug-in and one from the C++ compiler:
- Found C-Array: {0}
- ISO C++ forbids converting a string constant to '
char*
'
The first one tells us again that plain arrays should be verboten. We can automatically change that to use std::array<Score,3> instead, but that would be C++ already. The second warning tells us something about the differences between C and C++, where C++ makes string literal const char arrays, whereas classic C made those char pointers (even without const, which wasn't invented then). So in theory, one could overwrite the literal values for the names given here. This also tells something about the type Score that keeps plain char pointers and no size information. In production code, this is a no-no, because you can not make any safe memory management around such a string representation.
And now to the definitive culprit.
to_string(sc, ARRAY_SZ(sc), buffer, ARRAY_SZ(buffer)); }
ARRAY_SZ(buffer)
expands to
sizeof(buffer)/sizeof(buffer[0])
and this generates the compiler warning:
<ArticleBodyIndent> 'sizeof' on array function parameter 'buffer' will return size of 'char*' [-Wsizeof-array-argument]</ArticleBodyIndent>which we can compute on a 64 bit computer to become 8 instead of the 100 our student expected and the equivalent of the array with the pointer makes it obvious. to_string
is by the way a bad name for new code, because it is now a function overload in C++’s std
namespace and could make problems, when this code is naively ported to C++ with a using namespace std
in scope (BTW, Cevelop provides a refactoring away from such practice). Cevelop also suggest to change the macro for ARRAY
to a C++ inline (template) function and can do so. In addition this will avoid the potential bug, since the macro is not having parenthesis around its expansion and not around the second use of the parameter. This can cause parsing havoc if used with non-literals as arguments.
So a more correct version of
#define ARRAY_SZ(x) sizeof(x)/sizeof(x[0])would be
#define ARRAY_SZ(x) (sizeof(x)/sizeof((x)[0]))but even better in C++11:
template<typename T1> inline constexpr auto ARRAY_SZ(T1&& x) -> decltype(sizeof (x) / sizeof (x[0])) { return (sizeof (x) / sizeof (x[0])); }
But again that is to no avail in the correct working of process. There are several problems to be resolved with its design. First, passing the buffer from the outside, requires cautions use of its memory resource, so we need to pass the size of the available space. But there is still a problem, we do not know, if the size was sufficient or not. In such a case, it might be better to return an indication if the size was sufficient. This could be a bool
, denoting success (not very helpful), the size actually used and an error code, if insufficient (i.e., -1) or best, the size required for the output, if it was too small, which could be a burden to implement and to use. In any case, passing in memory (buffer) to be used as function output is a hassle. In C++ I would recommend to use an ostringstream
which will automatically expand an underlying string object and return that by value. This is one area where C is really a burden vs. C++.
It is also unrealistic, that the Score
array sc
is local to the function, so this must be passed as well, again explicitly passing its size, meaning we need define 4 parameters and pass 4 interdependent arguments. Using a std::vector<Score>
in C++ would simplify that again.
Now to the last function in the code where the student actually shows, that she/he understands a bit about passing arrays as pointers with an explicit size. But again, such an API (inherent in C) is easy to use wrongly (as we have seen) by passing inconsistent sizes. Also a return type of void
signals side effects, the code can have quite unintended ones.
void to_string(Score *scores, size_t n, char *buffer, size_t len) { for (size_t i = 0; i < n; i++) { size_t printed = snprintf(buffer, len, "%s:\t%u\n", scores[i].name, scores[i].score); buffer += printed; len -= printed; } }
Again, here C gets in the way of safe usage of memory. First snprintf
’s return type is int
, and there are obscure cases, where it might return a negative number. Second, if the space allowed is too small, it still returns the number of characters that would be needed, so the code will miscalculate the value of len
, which might underflow. While underflow and overflow does not trigger undefined behaviour with unsigned types, such as size_t
, the resulting value is still well beyond the capacity of the buffer and thus can easily lead to overwriting memory far away from the allotted memory, resulting in another severe security issue or potential crashes from overwriting return addresses.
A quick fix, would be to at least check if printed is smaller than len
and only then proceed and otherwise return from the function.
if(printed >= len) return;
I think I covered most of the problems by now. So how would a refactored C++ version look?
Here is one, that uses the recommendation I put above and getting rid of most of the bad C habits. Its overall design is still not what I consider splendid, but without more context it would be hard to judge if Score
should have a constructor, be const
or have member functions/related functions, such as an overloaded output operator<<
.
#include <cstdio> #include <sstream> #include <vector> struct Score { std::string name; int score; }; using ScoreList=std::vector<Score>; std::string to_string(ScoreList const &scores) { std::ostringstream out{}; for (auto const &score:scores) { out << score.name << '\t' << score.score << '\n'; } return out.str(); } std::string process() { ScoreList sc { { "Roger", 10 }, { "Bill", 5 }, { "Wilbur", 12 }, }; return to_string(sc); } int main() { std::puts(process().c_str()); }
Hope that helps some programmers to become better programmers and write less code.
Gareth Ansell <gareth.ansell@sky.com>
While trying to solve this puzzle I was initially tempted to dive straight for the compiler and start hacking away at the code. However, I managed to resist this urge and engaged brain instead. After which the solution was quite easy to spot. I then modified the code to test my hypothesis, which was proved correct.
The initial problem is in the process()
function, where the buffer array is passed as an argument. Since this is decomposed to a pointer, it is not possible for the subsequent call to the to_string()
function to determine the size of buffer[]
by using sizeof
(in the ARRAY_SZ
macro).
In C the solution to this is to add a length
parameter to the process()
function, and use this in the call to to_string()
. In C++, a templated solution could be used.
Apart from this there are a few minor niggles:
- In
_Score
score
is anint
, but in the call tosnprintf
it is referenced as anunsigned
. - The last element of the
sc[]
array inprocess[]
has a trailing comma - The
for
loop into_string()
compares a signed to unsignedint
.
My working solution is shown below:
#include <stdio.h> #define ARRAY_SZ(x) sizeof(x)/sizeof(x[0]) typedef struct _Score { char *name; unsigned int score; } Score; void to_string(Score *scores, size_t n, char *buffer, size_t len) { for(unsigned int i = 0; i < n; i++) { size_t printed = snprintf(buffer, len, "%s:\t%u\n", scores[i].name, scores[i].score); buffer += printed; len -= printed; } } void process(char buffer[], size_t len) { Score sc[] = { { "Roger", 10 }, { "Bill", 5 }, { "Wilbur", 12 } }; to_string(sc, ARRAY_SZ(sc), buffer, len); }
Matthew Wilson
This one is rather simple, I think: it’s an issue of array->pointer decay.
The ARRAY_SZ()
macro follows a familiar pattern in many codebases, used to fill the obvious missing dimensionof()
operator that should exist in C (and C++). It works by creating a compile-time constant representing the number of elements in an array by dividing the total size of the array by the size of an element. Works fine for arrays.
The problem is, it also works for pointers and, in C++, for types defining the subscript operator. And in such cases the sizes obtained is almost never correct (and sometimes not compile-time constant). This is all discussed at length in chapter 14 – Arrays and Pointers – of my book Imperfect C++ [1], so I won’t belabour the point here.
Also discussed in the same chapter of IC++ is the phenomenon of array-pointer decay. Briefly, it allows an array to be used in circumstances where a pointer is required. Also, a function declaration involving an array is interpreted, in a similar vein, as a pointer. Hence, the declaration of process()
as
void process(char buffer[])
is exactly equivalent to the declaration
void process(char* buffer)
Expressed in this form, the problem is all too easy to see. The size of ARRAY_SZ(buffer)
is going to be 4 (32-bit) or 8 (64-bit), and certainly not the 100 of the actual buffer buffer
declared in main()
.
The obvious fix is to change process()
to have the signature
void process(char* buffer, size_t len)
which is hinted at strongly in the earlier defined to_string()
.
When so amended – along with some const-correction, VC++ compatibility, and use of STLSoft’s STLSOFT_NUM_ELEMENTS()
(which makes application of ARRAY_SZ()
to a pointer a compile-time, rather than runtime, error) – the code looks like:
#include <stlsoft/stlsoft.h> #include <stdio.h> #if defined(_MSC_VER) # define snprintf _snprintf #endif #define ARRAY_SZ(x) STLSOFT_NUM_ELEMENTS(x) typedef struct _Score { char const* name; int score; } Score; void to_string(Score const* scores, size_t n, char *buffer, size_t len) { for (size_t i = 0; i < n; i++) { size_t printed = snprintf(buffer, len, "%s:\t%u\n", scores[i].name, scores[i].score); buffer += printed; len -= printed; } } void process(char* buffer, size_t len) { Score const sc[] = { { "Roger", 10 }, { "Bill", 5 }, { "Wilbur", 12 }, }; to_string(sc, ARRAY_SZ(sc), buffer, len); } int main() { char buffer[100]; process(buffer, ARRAY_SZ(buffer)); printf(buffer); }
Reference
[1] Imperfect C++, Matthew Wilson, Addison-Wesley, 2004.
James Holland <James.Holland@babcockinternational.com>
My compiler did not provide any helpful hints as the student’s program compiles without any errors or warnings. When I ran the program, "Rog" was displayed as the student observed. After a little investigation, I find that the problem is in the parameter of process
. I suspect than the student is under the impression that an array is being passed to process
, after all the type of the parameter, char buffer[]
, looks like an array declaration. Unfortunately, despite its looks, the parameter type is equivalent to char * buffer
. When the latter declaration is used, it becomes clear that a pointer to buffer
is being passed to process
and not buffer
itself. From within process
, it is the size of the pointer that is passed to to_string
and not the size of buffer
. On my machine, pointers are 4 bytes in length and so the value 4 is being passed to to_string
. This explains why the program outputs "Rog". The function to_string
, and ultimately snprintf
, thinks that buffer
is only 4 characters long, enough for three characters and the null terminator.
Unfortunately, it is not possible to pass to a function an array by value. Only a pointer to an array can be passed. This presents no great difficulty, however. If in addition to passing a pointer to the array, the length of the array is passed, process
will have all the information it needs about buffer
. Within process
, the length of the output buffer can be passed straight to to_string
as shown below.
void process(char * buffer, size_t length) { Score sc[] = {{"Roger", 10}, {"Bill", 5}, {"Wilbur", 12},}; to_string(sc, ARRAY_SZ(sc), buffer, length); }
When this modification is made, things look a lot better and the program produces the desired result. There are, however, some unresolved problems than could manifest themselves in the student’s real program.
It is assumed that the student is using snprintf
(as opposed to sprintf
) because he/she does not want data to be written beyond the limits of buffer
. This is a laudable desire but unfortunately the code, as it stands, does not provide that protection in general. There is still a potential problem when the size of the data to be written exceeds the size of buffer
. This is demonstrated more conveniently if the size of buffer
is reduced instead of increasing the size of the data.
Suppose, for example, that instead of buffer
being declared 100 characters long, it had been declared 17 characters long. The program would correctly write to buffer
the string representing the data for Roger. snprintf
returns the length of the string (not including the null terminator) that it attempts to write, in this case, 10 characters. This value is assigned to the variable printed
. This value is then subtracted from len
(that currently has a value of 17) leaving 7. The program then attempts to write the data for Bill. This string is 8 characters long. The program (or more specifically snprintf
) writes as much of the data to buffer
as it can without overflowing buffer
. Although the output string has been truncated, nothing disastrous (as far as program execution is concerned) has occurred. Next, the length of string for Bill is subtracted from len
. The value of len
is currently 7 and the length of the string for Bill is 8 characters. The subtraction to be performed is, therefore, 7 minus 8. This is where problems start. The variables printed
and len
are both of type size_t
(an unsigned type). The result of the subtraction is not -1, as expected, but a very large positive number. On my machine the value of len
at this point is 4294967295. The program then goes around the loop once more to write the data for Wilbur and, thinking there is plenty of space in buffer
(because len
is a large number), writes Wilbur’s string beyond the end of buffer
, probably with disastrous consequences.
This behaviour can easily be prevented by inserting the following code just after the snprintf
statement. It will prevent the program going around the loop again when buffer is full and will also print a message explaining the problem.
if (printed >= len) { printf("Buffer exhausted. Results may be " "incorrect.\n"); break; }
While on the subject of buffer length, there is another problem that occurs if buffer
is declared as having zero size. I admit this is unlikely to occur in an otherwise fully debugged program but there is, at least, a theoretical point to be made here. An array of zero bytes will have an address, so it can be referenced. Such an array will, in all probability and not unreasonably, occupy zero bytes of memory. Given this, printf
(as used in the last statement of main
) will start printing at a memory location that has nothing to do with buffer. It could print anything of any length, depending on what data it stumbles across. This behaviour must be prevented if there is any possibility of buffer
being of zero length. A simple if
statement around printf
would suffice. I leave the details to the student.
There is another problem with the printf
statement; it is being used in a potentially unsafe way. printf
is declared (in stdio.h) as having one or more parameters. The first parameter is a format control string. The student’s printf
statement has one parameter only and so the parameter must be the control string. It can be seen from an inspection of the printf
statement that the control string is the contents of buffer
. The contents of buffer
came originally from the array sc
. If one of the names in sc
were to be changed, say, from "Roger" to "%sRoger", the program will probably crash in spectacular fashion. This is because printf
’s format control string now says there is a string parameter to be printed despite there not actually being one. The result is undefined behaviour. To prevent this, a literal string should be supplied as the format control string and the string to be printed (in this case buffer
) supplied as the second parameter as shown below.
printf("%s", buffer);
printf
will now simply print the contents of buffer
without interpreting any characters sequence as format control characters, as required.
Although it does not show itself in the student’s test program, there is another potential problem lurking in the wings. It is the definition of the macro ARRAY_SZ
. Suppose the student decides to make use of this macro somewhere else in his or her code. The student may write something like the following statement, for whatever reason.
size_t x = 25 % ARRAY_SZ(sc);
Let us assume that sc
is the same array as defined in the student’s test program. ARRAY_SZ(sc)
should, therefore, produce the value 3. The expression becomes 25 % 3 which is equal to 1. The variable x
should, therefore, be initialised with the value of 1. In fact x
is assigned a value of 0, contrary to all expectation. How can this be? All becomes clear if the macro is expanded to produce the expression seen by the compiler, as illustrated below.
size_t x = 25 % sizeof(sc) / sizeof(sc[0]);
The expression will be evaluated from left to right. So the first term that is evaluated is 25 % sizeof(sc)
or 25 % 24 which equals 1. Next, the term sizeof(sc[0])
is evaluated, this equals 8. Finally, 8 is divided into 1 giving zero. It can now be seen that brackets are required around sizeof(sc) / sizeof(sc[0])
so that this part of the expression is evaluated first. This is achieved by enclosing the definition of ARRAY_SZ(x)
in parentheses.
#define ARRAY_SZ(x) (sizeof(x)/sizeof(x[0]))
In fact it is good practice to enclose the definition of all but the simplest macros in parentheses to prevent this type of error.
There are a couple of inconsistencies in the program. Score::score
is of type int
and yet it is being printed by snprintf
as if it were unsigned. Assuming Score::score
is meant to be signed, then snprintf
’s format control string should be "%s:\t%d\n"
.
Also, quoted text strings are considered constant and yet Score::name
is declared as a pointer that can modify what it is pointing to. The declaration const char * name
provides the remedy. This will prevent accidentally attempting to write to a constant string as a compile-time error will result.
I noticed that the student provided a tag name when defining the Scores
structure by use of a typedef
statement. A structure need only have a tag name when the structure makes reference to pointers of the same type. This is not the case in the student’s program and so the tag name is not required. Incidentally, it looks as if the student has made an attempt to prevent the tag name from clashing with the type name by use of a leading underscore character. This is not necessary as the two names are in different ‘name spaces’. The tag and the type could both be named Score
.
There is a redundant comma at the end of SC
’s initialiser list. The compiler is perfectly happy with this and it does not affect the meaning of the program. It is allowed, at least in part, to make the job of automatic code generators a little easier. However, as it is not required, I prefer not to see a trailing comma in hand-written code.
Finally, it might be constructive to discuss the choice of variable names. To some degree this is a matter of personal style but there are some guidelines that should be observed. The names of variables (and other identifiers) should reflect their meaning and should not be excessively abbreviated. I would prefer to see length_of_buffer
rather than len
and expected_print_length
rather than printed
, for example. Selecting suitable names can be quite tricky and I am not suggesting my examples are ideal but I do think they help in understanding how the program works.
I have made quite a few corrections and suggestions in reviewing the student’s code. This should not be seen as a damning criticism designed to dishearten the student. On the contrary, it is designed to encourage the student to complete his or her project in particular and to continue to learn about the fascinating topic of computer programming and software development in general.
Commentary
With five pretty comprehensive entries I’m not sure there’s much for me to add. About the only thing no-one remarked on was the inconsistent brace positioning!
It seems a shame that snprintf()
doesn’t provide a foolproof safe replacement for sprintf()
– examples like this code show how easy it is, given the variety of return values and the dangers of implicit conversion between signed and unsigned integer values, to use snprintf()
in an unsafe way.
One additional point that might be worth making is that Visual Studio 2015 does provide snprintf()
in <stdio.h> (although I haven’t yet found where this is documented on MSDN.) This function seems to follow the behaviour required by the C11 standard. However, the Microsoft specific function _snprintf()
is subtly different; in particular it does not ensure the target buffer is null terminated. This is an additional source of potentially dangerous confusion around this function call. Even beyond that, the implementation of snprintf()
in the mingw implementation of g++ 4.9.2 on Windows also fails to ensure the buffer is null terminated.
The moral of this critique is to be extremely careful if you use snprintf()
in C code – or in C++.
The Winner of CC 95
There were five good critiques and I think each one would have helped the person with the original problem to understand what was wrong and to fix it. However, I think Peter’s critique covered the most ground and I have awarded him the prize for this issue’s critique.
Code Critique 96
(Submissions to scc@accu.org by Oct 1st)
Thanks are due to Hubert Matthews for the idea that inspired this critique.
I have written some code to read in a CSV file and handle quoted strings but I seem to get an extra row read at the end, not sure why.
If I make a file consisting of one line:
--- Book1.csv ---
word,a simple phrase,"we can, if we want, embed commas"
--- ends ---
I get this output from processing the file:
Rows: 2 Cells: 3 word a simple phrase "we can, if we want, embed commas" Cells: 1
What have I done wrong?
The code is in Listing 2.
// Reading CSV with quoted strings. #include <iostream> #include <string> #include <vector> typedef std::string cell; typedef std::vector<cell> row; typedef std::vector<row> table; table readTable() { char ch; table table; // the table row *row = 0; // the row cell *cell = 0; // the cell char quoting = '\0'; while (!std::cin.eof()) { char ch = std::cin.get(); switch (ch) { case '\n': case ',': if (!quoting) { cell = 0; if (ch == '\n') { row = 0; } break; case '\'': case '"': if (quoting == ch) { quoting = '\0'; } else if (!cell) { quoting = ch; } } default: if (!row) { table.push_back({}); row = &table.back(); } if (!cell) { row->push_back({}); cell = &row->back(); } cell->push_back(ch); break; } } return table; } int main() { table t = readTable(); std::cout << "Rows: " << t.size() << "\n"; for (int r = 0; r != t.size(); ++r) { std::cout << "Cells: " << t[r].size() << "\n"; for (int c = 0; c != t[r].size(); ++c) { std::cout << " " << t[r][c] << "\n"; } } } |
Listing 2 |
You can also get the current problem from the accu-general mail list (next entry is posted around the last issue’s deadline) or from the ACCU website (http://www.accu.org/journals/). This particularly helps overseas members who typically get the magazine much later than members in the UK and Europe.
Notes:
More fields may be available via dynamicdata ..