Journal Articles

Overload Journal #144 - April 2018 + Programming Topics
Browse in : All > Journals > Overload > o144 (7)
All > Topics > Programming (877)
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: 5 Reasons NOT to Use std::ostream for Human-Readable Output

Author: Bob Schmidt

Date: 03 April 2018 16:42:39 +01:00 or Tue, 03 April 2018 16:42:39 +01:00

Summary: C++’s ostream can be hard to use. Sergey Ignatchenko suggests we use the {fmt} library instead.

Body: 

Disclaimer: as usual, the opinions within this article are those of ‘No Bugs’ Hare, and do not necessarily coincide with the opinions of the translators and Overload editors; also, please keep in mind that translation difficulties from Lapine (like those described in [Loganberry04) might have prevented an exact translation. In addition, the translator and Overload expressly disclaim all responsibility from any action or inaction resulting from reading this article.

This is NOT yet another printf-vs-cout debate

First of all, to avoid being beaten really hard, I have to say that I am perfectly aware of all the arguments presented in favour of 30+-year-old std::ostream (that is, compared to printf() which arguably comes from 50+-years-old BCPL) – and moreover, that I am NOT going to argue for printf() in this article.

The arguments usually used to push cout over printf, are the following [C++ FAQ]:

  1. iostream is type-safe.

    ‘No Bugs’ comment: I am the last person to argue about this one.

  2. it is less error-prone (referring to reducing redundancy)

    ‘No Bugs’ comment: while saying that reducing redundancy is the same as being less error-prone is a bit of stretch in general (in quite a few cases, redundancy is exactly what keeps us from making silly mistakes), in the context of the cout-vs-printf() debate, I can agree with it.

  3. it is extensible (allowing you to specify your own classes to be printed).

    ‘No Bugs’ comment: very nice to have indeed.

  4. std::ostream and std::istream are inheritable, which means you can have other user-defined things that look and act like streams, yet that do whatever strange and wonderful things you want.

    ‘No Bugs’ comment: TBH, I fail to see why being inheritable is an advantage per se; especially as extending existing functionality doesn’t depend on inheritance (at least, as long as no virtual functions are involved, and I don’t see many of them in std::ostream as such). The best I can make out of this one is understanding it as ‘being able to provide my own underlying streambuf to be used by ostream’, which does qualify as an advantage (at least over printf(),which doesn’t provide such an option at all: more on this below).

Once again, I am NOT going to argue with the points above (doing so would certainly start another World Flame War); instead, I just want to take them as a starting point (clarifying the one which isn’t obvious to me, so it is specific enough for our purposes).

ostream is far from perfect

Even with all its advantages over 50+year-old printf(), ostream is still far from perfect – at least for human-readable outputs.

Ok, so far we have seen the good side of ostream; however (conspicuously omitted from [C++ FAQ]), it has quite a few downsides too, especially if we concentrate on specific use cases for std::ostream. A whole bunch of very popular use cases for ostreams involve formatting output which is intended to be read by human beings. Two popular examples of such formatting include:

As it said on the tin, we’re going to concentrate on output intended for human beings – and while we’re at it, we’ll keep in mind the two major use cases above. And, as I am going to present a point of view which – while it was articulated previously in [Moria] and [NoBugs] – is certainly not as popular as the four points above (yet?), I am going to be significantly more verbose than [C++ FAQ].

So, in no particular order, here they are: the major drawbacks of ostreams when used to format human-readable outputs.

Drawback number 1: i18n

“Vantage number one!” said the Bi-Coloured-Python-Rock-Snake.“You couldn’t have done that with a mere-smear nose. Try and eat a little now.”
~ Bi-Coloured-Python-Rock-Snake from ‘Just So Stories’

The first major problem with using ostream-like chevron-based formatting for human-readable strings is internationalization. Let’s take a look at a piece of code which formats a simple message for the UI of an online poker game:

  some_ostream << winner.name << " shows " 
    <<   winner.cards << " and wins $" 
    << pot_size / 100 << "." << std::setw(2) 
    << std::setfill('0') << pot_size % 100;
    // we have pot_size stored in cents, but have to
    // display it in a more conventional manner

NB: for our purposes, let’s skip the discussion about localizing currency signs and dots-vs-commas; in particular, for online games, the former happens to be not a question of locality, but a question of what currency this site really uses, and nobody gives a damn about the latter.

When trying to translate this code, it happens to suffer from two huuuuge (actually, bordering on insurmountable) problems, namely:

Now, let’s come to specific examples; to illustrate better than ostream alternatives throughout this article, I (by definition) have to use something different from ostream. However, as I don’t want to use printf() for this purpose (to make it even more clear that I am NOT advocating a return to printf()) I’ll use one of Python’s format options (the curly braced one) to illustrate how things can be done. In Python, our formatting looks as follows:

  print("{0} shows {1} and wins ${2}.{3:02d}"
  .format(winner.name,winner.cards,pot_size/100,
  pot_size%100))

Here, we have our string (with placeholders in curly brackets) and can easily pass it to the translation team . While we will still have to replace our original literal with something read from a file at runtime, it is still nothing compared to the need to rewrite the whole ostream-based thing (with all the possible variations for the order of parameters). Most importantly, with Python-like formatting, both our i18-related points above are addressed:

Drawback number 2: multithreading

“Vantage number two!” said the Bi-Coloured-Python-Rock-Snake.“You couldn’t have done that with a mere-smear nose. Don’t you think the sun is very hot here?”
~ Bi-Coloured-Python-Rock-Snake from ‘Just So Stories’

While i18n is mostly in the realm of strings intended for some kind of UI, our second drawback is mostly related to logging in a multithreaded environment.

NB: for this drawback, I’ll use different example code – which is more typical for logging than for formatting for a UI, and that’s where this particular problem is more likely to manifest itself.

If you have ever written innocent-looking code such as

  logging_stream << "Event #" << std::setw(8) 
  << std::setfiller('0') << std::hex << event_id 
  << ": a=" << std::dec << a << " b=" << b << "\n";

and then tried to run it in two different threads simultaneously, you know that the code above can easily generate all kinds of weird outputs, including such beauties as

  Event #Event #0089a1b2c3d4e5f6: a=12: a= b=
  b=345678
  <\n>
  <\n>

In addition to being completely unreadable, there is absolutely no way to figure out how digits from ‘345678’ were distributed between one a and two bs coming from different threads (and in which order BTW).

The reason for it is simple: with ostream, instead of calling one implementation function, we’re calling several separated << operators; in turn, this means that the largest possible synchronization unit for cout stream is not a phrase (~= “one line we want to output”), but merely each of the items between << chevrons. This inevitably leads to potentially having outputs such as the one above.

Sure, somebody can say “Hey, you should place a mutex lock above that line” – and it would help; however, placing such mutex locks is not just error-prone, but error-prone-squared because (a) it is easy to forget to place it, and (b) it is even easier to forget to unlock it right after the cout line (which, in turn, can easily lead to a huuuuge performance degradation for no reason whatsoever).

A better alternative is proposed in [P0053R7], where a special temporary object (an instance of class osyncstream, which is derived from ostream) is constructed on top of our real ostream object (such as cout). Then, the osyncstream object will buffer all the output written to it via << operators, and will write to the underlying cout only at the point of being destructed. This ensures that all the output written to osyncstream is guaranteed to be written in one piece <phew />. IMO, osyncstream is indeed a pretty good workaround for this particular problem (at any rate, much better than mutexes), but it still has the following significant issues:

  1. unless we limit ourselves to one-line uses of our osyncstream object (more precisely, to creating an osyncstream instance only temporarily), writing to the underlying stream in the destructor becomes rather counterintuitive, and it is easy to forget to limit the scope of our osyncstream, which can lead to reordering of whole ‘phrases’ in our log (it won’t look as bad as reordering of the words shown above, but can still cause significant confusion when reading the logs);
  2. extra buffering won’t come for free (especially as the current proposal seems to use allocations <ouch !/>); and
  3. [P0053R7] won’t help with the other issues discussed in the article (though maybe it might help to deal with our next drawback – sticky flags – too).

Drawback number 3: sticky flags

Anyone who has tried to do some formatting which goes beyond the textbook using cout has encountered a huuuge problem that

With ostream, formatting modifiers (such as hex-vs-dec, filler, etc.) are considered an attribute of the stream, not of the output operation.

In other words: formatting flags, once applied, ‘stick’ to the stream. This, in turn, means that if you forget to revert them back, you’ll obtain an unexpectedly formatted output (and of course, it won’t be noticed until production, and will manifest itself in exactly the place where it causes the maximum possible damage).

This problem becomes especially bad in scenarios where we have one global stream (such as cout or a log file). In fact, it means that our formatting flags become a part of the GLOBAL mutable program state – and last time I checked, everybody of sane mind (including those people who are arguing for cout), agrees that global mutable state is a Bad Thing™.

In fact, this problem is so bad, that Boost even has a special class to deal with it! With Boost’s ios_flags_saver, our code will look like:

  boost::io::ios_flags_saver ifs(logging_stream);
  logging_stream << "Event #" << std::setw(8) 
    << std::setfiller('0') << std::hex << event_id
    << ": a=" << std::dec << a << " b=" << b 
    << "\n";

However, even with such an RAII-based workaround, once again it is error-prone: it is easy to forget to add the ios_flags_saver – especially if the policy is to use it only when some sticky manipulators are applied (and if our project Guidelines say ‘always use ios_flags_saver’, it would be a violation of the ‘not paying for what we don’t use’ principle, and would still be rather error-prone).

Drawback number 4: readability

“Vantage number three!” said the Bi-Coloured-Python-Rock-Snake. “You couldn’t have done that with a mere-smear nose. Now how do you feel about being spanked again?”
~ Bi-Coloured-Python-Rock-Snake from ‘Just So Stories’

Now, let’s try to write down our full examples of formatting human-readable output using ostream (while keeping all the considerations above in mind). To summarize, our rather simple formatting code examples will look like Listing 1.

// UI formatting
// guard is probably NOT required here, as we’re not
// likely to work with UI strings from multiple
// threads
boost::io::ios_flags_saver ifs(some_ostream);
some_ostream << winner.name << " shows " 
  << winner.cards << " and wins $" 
  << pot_size / 100 << "." << std::setw(2) 
  << std::setfill('0') << pot_size % 100;

//logging
std::lock_guard<std::mutex>
  guard(logging_stream_mutex);
boost::io::ios_flags_saver ifs(logging_stream);
logging_stream << "Event #" << std::setw(8) 
  << std::setfiller('0') << std::hex << event_id 
  << ": a=" << std::dec << a << " b=" 
  << b << "\n";
guard.unlock();//as discussed above, we don’t want
// to keep lock longer than really necessary
			
Listing 1

When looking at the code in Listing 1, I cannot help but think that it has been spanked by the Elephant’s Child has fallen from the Ugly Tree™ (hitting all the ugly branches on the way down). And whenever somebody tells me that this code is readable, I can only ask them to compare it with the way the same thing is done in pretty much all other languages but C++ (yes, even in C – though using an unmentionable function); in particular, in Python it would look like Listing 2.

  //UI formatting
  print("{0} shows {1} and wins ${2}.{3:02d}"
  .format(winner.name,winner.cards,pot_size/100,
     pot_size%100))

  //Logging
  print("Event #{0:08x}: a={1:d} b={2:d}"
    .format(event_id, a, b))
			
Listing 2

Formally speaking, the ostream-based code above has between 2x and 4x more characters, and between 2.5x and 5x more non-whitespace YACC tokens, than the demonstrated format-string based alternative, and while brevity does not necessarily equate to better readability, in the case of a 300–400% overhead, it usually does.

And if looking at it informally, with just (hopefully) an unbiased programmer’s eyes:

I think the answer to ‘which of the two pieces of code above can be seen as readable’ is very obvious

(hint: I do NOT think that the ostream-based one qualifies as such).

Drawback number 4.5: writing customized underlying stream could be better

Yet another drawback of the ostream (BTW, this one stands regardless of whether it is being used for human-readable output) is that the process of writing the underlying stream is rather non-obvious and is seriously error-prone. I don’t want to go into details here (it is way too long since the last time I did it myself) but [Tomaszewski] describes what I remember pretty well, including observations such as “Properly deriving from std::streambuf is not easy and intuitive because its interface is complicated”, and making “a very subtle bug which took me several hours to detect”.

To be perfectly honest, it is still MUCH better than not being unable to write a customized stream at all (as is the case for printf()), but – as I noted above – I am not speaking in terms of printf(), and being prone to subtle bugs is certainly not a good thing for those who need to rewrite an underlying streambuf.

Drawback number 5: something MUCH better exists

All the musing about the drawbacks of ostream would remain a rather pointless ranting if not for one thing: a library exists which has all the ostream-like advantages listed in [C++ FAQ], and none of the drawbacks listed above.

Actually, there are several such libraries (Boost format, FastFormat, tinyformat, {fmt}, and FollyFormat – and probably something else which I have forgotten to mention). I have to note that, personally, I don’t really care too much which one of the competing new-generation format libraries makes it into the standard (except, probably, for Boost format, which is way too resource-intensive when compared to the alternatives). In general, I (alongside with a very significant portion of the C++ community) just want some standard and better-than-iostream way of formatting human-readable data.

Out of such newer formatting libraries I happen to know {fmt} by Victor Zverovich the best, and it certainly looks very good, satisfying all the points from C++ FAQ, and avoiding all the iostream problems listed above. As {fmt} is also the only new-generation library with an active WG21 proposal [P0645R1], it is the one I’m currently keeping my fingers crossed for. (NB: in the past, there was another proposal, [N3506], but it looks pretty much abandoned).

In this article, I am not going to go into lengthy discussion about {fmt} vs the alternatives – but will just mention that with {fmt}, our examples will look like:

fmt::print("{0} shows {1} and wins ${2}.{3:02d}",
  winner.name,winner.cards,pot_size/100,
  pot_size%100);
  //this is C++, folks!

  fmt::print("Event #{0:08x}: a={1:d} b={2:d}",
    event_id, a, b);

This alone allows us to avoid most of the problems listed above (and FWIW, I’d argue it is even more readable than Python); in addition, {fmt} is type-safe, extensible, supports both ostream and FILE* as underlying streams (with the ability to add your own stream easily), beats ostream performance-wise, et cetera, et cetera.

C++ Developer Community on formatting approaches

After all the theorizing about different formatting approaches, let’s see what real-world developers are saying about the different libraries available for this purpose. First, I have to note that even before the advent of the new generation of format libraries – and in spite of enormous pressure exerted by quite a few C++ committee members via their numerous publications in favour of cout – real-world C++ developers were badly split on the question “what is better – cout or printf()” (see, for example, statistics in [StackOverflow] and [Quora]). Now, with {fmt} available, developers seem to agree that it is the best real-world option out there [Reddit]; just two quotes from top-upvoted comments (which prove nothing, of course, but do count as anecdotal evidence):

Yes, I know it sounds like a bad commercial, but I am pretty sure these comments are genuine.

Oh, and if somebody in WG21 still has any doubts about what-C++ developers want to use for human-readable formatting, please let me know: I’ll organize a survey to get more formal numbers.

Conclusion

We took a look at std::ostream and issues with its real-world usage when formatting output intended for human beings. As a side note, we observed that most of the problems with std::ostream in this context arise from it working as a stream (either char stream, or word/token stream) while human beings tend to communicate in phrases or sentences, and one thing std::ostream is badly lacking is support for those phrases/sentences so ubiquitous in the real world.

Moreover, as we noted, there is more than one library out there which not only has all the advantages of ostream over printf() but also fixes all the drawbacks of the ostream we listed above. IMNSHO, there is no question of ‘what is better to use’ (that is, for human-readable outputs). This means that our (= ‘real-world C++ developers’) course of action is very clear:

References

[C++ FAQ] C++ FAQ, https://isocpp.org/wiki/faq/input-output#iostream-vs-stdio

[fmt] A modern formatting library, https://github.com/fmtlib/fmt

[Loganberry04] David ‘Loganberry’, Frithaes! – an Introduction to Colloquial Lapine!, http://bitsnbobstones.watershipdown.org/lapine/overview.html

[Moria] IOStream Is Hopelessly Broken, https://www.moria.us/articles/iostream-is-hopelessly-broken/

[N3506] Zhihao Yuan, A printf-like Interface for the Streams Library

[NoBugs] ‘No Bugs’ Hare, #CPPCON2017. Day 1. Hope to get something-better-than-chevron-hell, http://ithare.com/cppcon2017-day-1-hope-to-get-something-better-than-chevrone-hell/

[P0053R7] Lawrence Crowl, Peter Sommerlad, Nicolai Josuttis, Pablo Halpern, C++ Synchronized Buffered Ostream, http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0053r7.pdf

[P0645R1] Victor Zverovich, Lee Howes. Text Formatting. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0645r1.html

[Quora] When would you use fprintf instead of cerr/iostream in C++?, https://www.quora.com/When-would-you-use-fprintf-instead-of-cerr-iostream-in-C++

[Reddit] A Chance to Get Readable Formatting: {fmt}, https://www.reddit.com/r/cpp/comments/72krvy/a_chance_to_get_readable_formatting_fmt/

[StackOverflow] ‘printf’ vs. ‘cout’ in C++, https://stackoverflow.com/questions/2872543/printf-vs-cout-in-c

[Tomaszewski] Krzysztof Tomaszewski, Deriving from std::streambuf, https://artofcode.wordpress.com/2010/12/12/deriving-from-stdstreambuf/

Acknowledgement

Cartoon by Sergey Gordeev from Gordeev Animation Graphics, Prague

Notes: 

More fields may be available via dynamicdata ..