Journal Articles
Browse in : |
All
> Journals
> Overload
> o144
(7)
All > Topics > Programming (877) Any of these categories - All of these categories |
Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.
Title: 5 Reasons NOT to Use std::ostream for Human-Readable Output
Author: Bob Schmidt
Date: 03 April 2018 16:42:39 +01:00 or Tue, 03 April 2018 16:42:39 +01:00
Summary: C++’s ostream can be hard to use. Sergey Ignatchenko suggests we use the {fmt} library instead.
Body:
Disclaimer: as usual, the opinions within this article are those of ‘No Bugs’ Hare, and do not necessarily coincide with the opinions of the translators and Overload editors; also, please keep in mind that translation difficulties from Lapine (like those described in [Loganberry04) might have prevented an exact translation. In addition, the translator and Overload expressly disclaim all responsibility from any action or inaction resulting from reading this article.
This is NOT yet another printf-vs-cout debate
First of all, to avoid being beaten really hard, I have to say that I am perfectly aware of all the arguments presented in favour of 30+-year-old std::ostream
(that is, compared to printf()
which arguably comes from 50+-years-old BCPL) – and moreover, that I am NOT going to argue for printf()
in this article.
The arguments usually used to push cout
over printf
, are the following [C++ FAQ]:
iostream
is type-safe.‘No Bugs’ comment: I am the last person to argue about this one.
- it is less error-prone (referring to reducing redundancy)
‘No Bugs’ comment: while saying that reducing redundancy is the same as being less error-prone is a bit of stretch in general (in quite a few cases, redundancy is exactly what keeps us from making silly mistakes), in the context of the
cout
-vs-printf()
debate, I can agree with it. - it is extensible (allowing you to specify your own classes to be printed).
‘No Bugs’ comment: very nice to have indeed.
std::ostream
andstd::istream
are inheritable, which means you can have other user-defined things that look and act like streams, yet that do whatever strange and wonderful things you want.‘No Bugs’ comment: TBH, I fail to see why being inheritable is an advantage per se; especially as extending existing functionality doesn’t depend on inheritance (at least, as long as no virtual functions are involved, and I don’t see many of them in
std::ostream
as such). The best I can make out of this one is understanding it as ‘being able to provide my own underlying streambuf to be used byostream
’, which does qualify as an advantage (at least overprintf()
,which doesn’t provide such an option at all: more on this below).
Once again, I am NOT going to argue with the points above (doing so would certainly start another World Flame War); instead, I just want to take them as a starting point (clarifying the one which isn’t obvious to me, so it is specific enough for our purposes).
ostream is far from perfect
Even with all its advantages over 50+year-old printf()
, ostream
is still far from perfect – at least for human-readable outputs.
Ok, so far we have seen the good side of ostream
; however (conspicuously omitted from [C++ FAQ]), it has quite a few downsides too, especially if we concentrate on specific use cases for std::ostream
. A whole bunch of very popular use cases for ostream
s involve formatting output which is intended to be read by human beings. Two popular examples of such formatting include:
- Formatting output which is shown to the end-user (usually in some kind of UI, whether graphical or not).
- Formatting output which is sent to text-based logs (which tends to apply both to the Client-Side and to the Server-Side).
Note that, strictly speaking, Server-Side text-based logs can be divided into (a) text logs used for monitoring purposes, and (b) text logs for post-mortem analysis, with a recent movement towards making (a) structured rather than free-text based. Still, I am sure that (b) is there to stay as free-text based, so the text logging use case will still stand even if the movement towards structured logging for monitoring purposes succeeds.
As it said on the tin, we’re going to concentrate on output intended for human beings – and while we’re at it, we’ll keep in mind the two major use cases above. And, as I am going to present a point of view which – while it was articulated previously in [Moria] and [NoBugs] – is certainly not as popular as the four points above (yet?), I am going to be significantly more verbose than [C++ FAQ].
So, in no particular order, here they are: the major drawbacks of ostream
s when used to format human-readable outputs.
Drawback number 1: i18n
“Vantage number one!†said the Bi-Coloured-Python-Rock-Snake.“You couldn’t have done that with a mere-smear nose. Try and eat a little now.â€
~ Bi-Coloured-Python-Rock-Snake from ‘Just So Stories’
The first major problem with using ostream
-like chevron-based formatting for human-readable strings is internationalization. Let’s take a look at a piece of code which formats a simple message for the UI of an online poker game:
some_ostream << winner.name << " shows " << winner.cards << " and wins $" << pot_size / 100 << "." << std::setw(2) << std::setfill('0') << pot_size % 100; // we have pot_size stored in cents, but have to // display it in a more conventional manner
NB: for our purposes, let’s skip the discussion about localizing currency signs and dots-vs-commas; in particular, for online games, the former happens to be not a question of locality, but a question of what currency this site really uses, and nobody gives a damn about the latter.
When trying to translate this code, it happens to suffer from two huuuuge (actually, bordering on insurmountable) problems, namely:
- Translations NEVER work by translating isolated words. In other words, there is no point in asking a translator to translate a fragment such as “shows†into a different language. Such translations (even if translators are silly enough to do them) will never work, simply because for translations context is everything – but with the code above, the context is buried within C++ code, and is not easily extractable (we DON’T want to teach translators C++, do we?)
- Moreover, the order of the parameters we want to substitute (the
winner.name
,winner.cards
, andpot_size
) can be different in a human-readable language other than English; with the code above, this would mean that potentially we have to rewrite the code for each target human language (ok, for 3 parameters, we can say that there aren’t more than 3!=6 possible combinations we have to code, but IMNSHO it is still 6x too much).
Now, let’s come to specific examples; to illustrate better than ostream
alternatives throughout this article, I (by definition) have to use something different from ostream
. However, as I don’t want to use printf()
for this purpose (to make it even more clear that I am NOT advocating a return to printf()
) I’ll use one of Python’s format options (the curly braced one) to illustrate how things can be done. In Python, our formatting looks as follows:
print("{0} shows {1} and wins ${2}.{3:02d}" .format(winner.name,winner.cards,pot_size/100, pot_size%100))
Here, we have our string (with placeholders in curly brackets) and can easily pass it to the translation team . While we will still have to replace our original literal with something read from a file at runtime, it is still nothing compared to the need to rewrite the whole ostream
-based thing (with all the possible variations for the order of parameters). Most importantly, with Python-like formatting, both our i18-related points above are addressed:
- Our original phrase to be translated exists as a self-contained literal. As practice shows, these tend to be perfectly translatable (in some cases, comments about the meaning of {0}, {1}, and {2} may need to be added to help translators better understand the context – but that’s about it, and most real-world phrases are already more or less self-contained).
- If there is a need to use a different order of parameters in the translated version, this can easily be done by the translator without any involvement from developers (which, BTW is exactly the way it should be).
Drawback number 2: multithreading
“Vantage number two!†said the Bi-Coloured-Python-Rock-Snake.“You couldn’t have done that with a mere-smear nose. Don’t you think the sun is very hot here?â€
~ Bi-Coloured-Python-Rock-Snake from ‘Just So Stories’
While i18n is mostly in the realm of strings intended for some kind of UI, our second drawback is mostly related to logging in a multithreaded environment.
NB: for this drawback, I’ll use different example code – which is more typical for logging than for formatting for a UI, and that’s where this particular problem is more likely to manifest itself.
If you have ever written innocent-looking code such as
logging_stream << "Event #" << std::setw(8) << std::setfiller('0') << std::hex << event_id << ": a=" << std::dec << a << " b=" << b << "\n";
and then tried to run it in two different threads simultaneously, you know that the code above can easily generate all kinds of weird outputs, including such beauties as
Event #Event #0089a1b2c3d4e5f6: a=12: a= b= b=345678 <\n> <\n>
In addition to being completely unreadable, there is absolutely no way to figure out how digits from ‘345678’ were distributed between one a
and two b
s coming from different threads (and in which order BTW).
The reason for it is simple: with ostream
, instead of calling one implementation function, we’re calling several separated <<
operators; in turn, this means that the largest possible synchronization unit for cout
stream is not a phrase (~= “one line we want to outputâ€), but merely each of the items between <<
chevrons. This inevitably leads to potentially having outputs such as the one above.
Sure, somebody can say “Hey, you should place a mutex lock above that line†– and it would help; however, placing such mutex locks is not just error-prone, but error-prone-squared because (a) it is easy to forget to place it, and (b) it is even easier to forget to unlock it right after the cout
line (which, in turn, can easily lead to a huuuuge performance degradation for no reason whatsoever).
A better alternative is proposed in [P0053R7], where a special temporary object (an instance of class osyncstream
, which is derived from ostream
) is constructed on top of our real ostream
object (such as cout
). Then, the osyncstream
object will buffer all the output written to it via <<
operators, and will write to the underlying cout
only at the point of being destructed. This ensures that all the output written to osyncstream
is guaranteed to be written in one piece <phew />. IMO, osyncstream
is indeed a pretty good workaround for this particular problem (at any rate, much better than mutexes), but it still has the following significant issues:
- unless we limit ourselves to one-line uses of our
osyncstream
object (more precisely, to creating anosyncstream
instance only temporarily), writing to the underlying stream in the destructor becomes rather counterintuitive, and it is easy to forget to limit the scope of ourosyncstream
, which can lead to reordering of whole ‘phrases’ in our log (it won’t look as bad as reordering of the words shown above, but can still cause significant confusion when reading the logs); - extra buffering won’t come for free (especially as the current proposal seems to use allocations <ouch !/>); and
- [P0053R7] won’t help with the other issues discussed in the article (though maybe it might help to deal with our next drawback – sticky flags – too).
Drawback number 3: sticky flags
Anyone who has tried to do some formatting which goes beyond the textbook using cout has encountered a huuuge problem that
With ostream
, formatting modifiers (such as hex-vs-dec, filler, etc.) are considered an attribute of the stream, not of the output operation.
In other words: formatting flags, once applied, ‘stick’ to the stream. This, in turn, means that if you forget to revert them back, you’ll obtain an unexpectedly formatted output (and of course, it won’t be noticed until production, and will manifest itself in exactly the place where it causes the maximum possible damage).
This problem becomes especially bad in scenarios where we have one global stream (such as cout
or a log file). In fact, it means that our formatting flags become a part of the GLOBAL mutable program state – and last time I checked, everybody of sane mind (including those people who are arguing for cout
), agrees that global mutable state is a Bad Thingâ„¢.
In fact, this problem is so bad, that Boost even has a special class to deal with it! With Boost’s ios_flags_saver
, our code will look like:
boost::io::ios_flags_saver ifs(logging_stream); logging_stream << "Event #" << std::setw(8) << std::setfiller('0') << std::hex << event_id << ": a=" << std::dec << a << " b=" << b << "\n";
However, even with such an RAII-based workaround, once again it is error-prone: it is easy to forget to add the ios_flags_saver
– especially if the policy is to use it only when some sticky manipulators are applied (and if our project Guidelines say ‘always use ios_flags_saver
’, it would be a violation of the ‘not paying for what we don’t use’ principle, and would still be rather error-prone).
Drawback number 4: readability
“Vantage number three!†said the Bi-Coloured-Python-Rock-Snake. “You couldn’t have done that with a mere-smear nose. Now how do you feel about being spanked again?â€
~ Bi-Coloured-Python-Rock-Snake from ‘Just So Stories’
Now, let’s try to write down our full examples of formatting human-readable output using ostream
(while keeping all the considerations above in mind). To summarize, our rather simple formatting code examples will look like Listing 1.
// UI formatting // guard is probably NOT required here, as we’re not // likely to work with UI strings from multiple // threads boost::io::ios_flags_saver ifs(some_ostream); some_ostream << winner.name << " shows " << winner.cards << " and wins $" << pot_size / 100 << "." << std::setw(2) << std::setfill('0') << pot_size % 100; //logging std::lock_guard<std::mutex> guard(logging_stream_mutex); boost::io::ios_flags_saver ifs(logging_stream); logging_stream << "Event #" << std::setw(8) << std::setfiller('0') << std::hex << event_id << ": a=" << std::dec << a << " b=" << b << "\n"; guard.unlock();//as discussed above, we don’t want // to keep lock longer than really necessary |
Listing 1 |
When looking at the code in Listing 1, I cannot help but think that it has been spanked by the Elephant’s Child has fallen from the Ugly Tree™ (hitting all the ugly branches on the way down). And whenever somebody tells me that this code is readable, I can only ask them to compare it with the way the same thing is done in pretty much all other languages but C++ (yes, even in C – though using an unmentionable function); in particular, in Python it would look like Listing 2.
//UI formatting print("{0} shows {1} and wins ${2}.{3:02d}" .format(winner.name,winner.cards,pot_size/100, pot_size%100)) //Logging print("Event #{0:08x}: a={1:d} b={2:d}" .format(event_id, a, b)) |
Listing 2 |
Formally speaking, the ostream
-based code above has between 2x and 4x more characters, and between 2.5x and 5x more non-whitespace YACC tokens, than the demonstrated format-string based alternative, and while brevity does not necessarily equate to better readability, in the case of a 300–400% overhead, it usually does.
And if looking at it informally, with just (hopefully) an unbiased programmer’s eyes:
I think the answer to ‘which of the two pieces of code above can be seen as readable’ is very obvious
(hint: I do NOT think that the ostream
-based one qualifies as such).
Drawback number 4.5: writing customized underlying stream could be better
Yet another drawback of the ostream
(BTW, this one stands regardless of whether it is being used for human-readable output) is that the process of writing the underlying stream is rather non-obvious and is seriously error-prone. I don’t want to go into details here (it is way too long since the last time I did it myself) but [Tomaszewski] describes what I remember pretty well, including observations such as “Properly deriving from std::streambuf
is not easy and intuitive because its interface is complicatedâ€, and making “a very subtle bug which took me several hours to detectâ€.
To be perfectly honest, it is still MUCH better than not being unable to write a customized stream at all (as is the case for printf()
), but – as I noted above – I am not speaking in terms of printf()
, and being prone to subtle bugs is certainly not a good thing for those who need to rewrite an underlying streambuf
.
Drawback number 5: something MUCH better exists
All the musing about the drawbacks of ostream
would remain a rather pointless ranting if not for one thing: a library exists which has all the ostream
-like advantages listed in [C++ FAQ], and none of the drawbacks listed above.
Actually, there are several such libraries (Boost format, FastFormat, tinyformat, {fmt}, and FollyFormat – and probably something else which I have forgotten to mention). I have to note that, personally, I don’t really care too much which one of the competing new-generation format libraries makes it into the standard (except, probably, for Boost format, which is way too resource-intensive when compared to the alternatives). In general, I (alongside with a very significant portion of the C++ community) just want some standard and better-than-iostream
way of formatting human-readable data.
Out of such newer formatting libraries I happen to know {fmt} by Victor Zverovich the best, and it certainly looks very good, satisfying all the points from C++ FAQ, and avoiding all the iostream
problems listed above. As {fmt} is also the only new-generation library with an active WG21 proposal [P0645R1], it is the one I’m currently keeping my fingers crossed for. (NB: in the past, there was another proposal, [N3506], but it looks pretty much abandoned).
In this article, I am not going to go into lengthy discussion about {fmt} vs the alternatives – but will just mention that with {fmt}, our examples will look like:
fmt::print("{0} shows {1} and wins ${2}.{3:02d}", winner.name,winner.cards,pot_size/100, pot_size%100); //this is C++, folks! fmt::print("Event #{0:08x}: a={1:d} b={2:d}", event_id, a, b);
This alone allows us to avoid most of the problems listed above (and FWIW, I’d argue it is even more readable than Python); in addition, {fmt} is type-safe, extensible, supports both ostream
and FILE*
as underlying streams (with the ability to add your own stream easily), beats ostream
performance-wise, et cetera, et cetera.
C++ Developer Community on formatting approaches
After all the theorizing about different formatting approaches, let’s see what real-world developers are saying about the different libraries available for this purpose. First, I have to note that even before the advent of the new generation of format libraries – and in spite of enormous pressure exerted by quite a few C++ committee members via their numerous publications in favour of cout
– real-world C++ developers were badly split on the question “what is better – cout
or printf()
†(see, for example, statistics in [StackOverflow] and [Quora]). Now, with {fmt} available, developers seem to agree that it is the best real-world option out there [Reddit]; just two quotes from top-upvoted comments (which prove nothing, of course, but do count as anecdotal evidence):
- I am already using
{fmt}
all over my projects but having it in thestd
would be great. - So happy this is steadily transitioning in
std
. One of the best formatting (and i/o) libs out there overall. Even without the localization argument, I’ve always found iostreams to be less convenient.
Yes, I know it sounds like a bad commercial, but I am pretty sure these comments are genuine.
Oh, and if somebody in WG21 still has any doubts about what-C++ developers want to use for human-readable formatting, please let me know: I’ll organize a survey to get more formal numbers.
Conclusion
We took a look at std::ostream
and issues with its real-world usage when formatting output intended for human beings. As a side note, we observed that most of the problems with std::ostream
in this context arise from it working as a stream (either char stream, or word/token stream) while human beings tend to communicate in phrases or sentences, and one thing std::ostream
is badly lacking is support for those phrases/sentences so ubiquitous in the real world.
Moreover, as we noted, there is more than one library out there which not only has all the advantages of ostream
over printf()
but also fixes all the drawbacks of the ostream
we listed above. IMNSHO, there is no question of ‘what is better to use’ (that is, for human-readable outputs). This means that our (= ‘real-world C++ developers’) course of action is very clear:
- Start using {fmt} as much as possible (well, you may choose some other library over {fmt}, but IMO fragmentation is a bad thing for such a library, so unless you have some very specific requirements, I suggest using {fmt} as a de facto standard). Aside from the direct benefits we’ll get from using it, it might help to iron out any subtle issues left (such as ‘how to implement compile-time type safety’), and to make the proposal to WG21 more solid.
- Keep our fingers crossed hoping that WG21 will take the P0645R1 proposal into the standard (though with the pace of changes making through WG21, I will have to pray really hard that it happens before I retire <sad-wink />).
References
[C++ FAQ] C++ FAQ, https://isocpp.org/wiki/faq/input-output#iostream-vs-stdio
[fmt] A modern formatting library, https://github.com/fmtlib/fmt
[Loganberry04] David ‘Loganberry’, Frithaes! – an Introduction to Colloquial Lapine!, http://bitsnbobstones.watershipdown.org/lapine/overview.html
[Moria] IOStream Is Hopelessly Broken, https://www.moria.us/articles/iostream-is-hopelessly-broken/
[N3506] Zhihao Yuan, A printf-like Interface for the Streams Library
[NoBugs] ‘No Bugs’ Hare, #CPPCON2017. Day 1. Hope to get something-better-than-chevron-hell, http://ithare.com/cppcon2017-day-1-hope-to-get-something-better-than-chevrone-hell/
[P0053R7] Lawrence Crowl, Peter Sommerlad, Nicolai Josuttis, Pablo Halpern, C++ Synchronized Buffered Ostream, http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0053r7.pdf
[P0645R1] Victor Zverovich, Lee Howes. Text Formatting. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0645r1.html
[Quora] When would you use fprintf instead of cerr/iostream in C++?, https://www.quora.com/When-would-you-use-fprintf-instead-of-cerr-iostream-in-C++
[Reddit] A Chance to Get Readable Formatting: {fmt}, https://www.reddit.com/r/cpp/comments/72krvy/a_chance_to_get_readable_formatting_fmt/
[StackOverflow] ‘printf’ vs. ‘cout’ in C++, https://stackoverflow.com/questions/2872543/printf-vs-cout-in-c
[Tomaszewski] Krzysztof Tomaszewski, Deriving from std::streambuf, https://artofcode.wordpress.com/2010/12/12/deriving-from-stdstreambuf/
Acknowledgement
Cartoon by Sergey Gordeev from Gordeev Animation Graphics, Prague
Notes:
More fields may be available via dynamicdata ..