Most developers know how to implement a timeout so that an operation can be attempted for a certain period of time before stopping or giving up. Something like Listing 1.
[Web Editor's Note: Listings 1 through 6 are incomplete in the print and PDF versions of Overload 156; they are corrected here and in the ePub version.]
#include <time.h> #include <stdbool.h> bool try_to_do_something(void); bool do_something_for_a_while(void) { const time_t expire = time(NULL) + 5; while (time(NULL) < expire) { if (try_to_do_something()) return true; } return false; } |
Listing 1 |
Or, perhaps, in C++, as in Listing 2.
#include <chrono> #include <stdexcept> bool try_to_do_something(); void do_something_for_a_while() { using namespace std::chrono; auto const expire = system_clock::now() + seconds(5); while (system_clock::now() < expire) { if (try_to_do_something()) return; } throw std::runtime_error("Timed out"); } |
Listing 2 |
This pattern even works when efficiently blocking for something to happen using a condition variable as in Listing 3.
#include <chrono> #include <condition_variable> #include <deque> void do_something(std::deque<int> &q); void do_something_for_a_while(std::deque<int> &q, std::mutex &protect_q, std::condition_variable &q_changed) { using std::chrono::system_clock; std::unique_lock<std::mutex> lock(protect_q); auto const expire = system_clock::now() + std::chrono::seconds(5); while (system_clock::now() < expire) { if (q_changed.wait_until(lock, expire, [&q] { return !q.empty(); })) do_something(q); } throw std::runtime_error("Timed out"); } |
Listing 3 |
In these examples, I’m using std::chrono::system_clock
because in libstdc++ that is equivalent to std::chrono::high_resolution_clock
. You may want to check your standard library documentation to determine which would be best for you.
But what happens if someone changes the system clock during the loop? Not every device has a real-time clock to keep time when the device is off. Even if a device does, it may not be particularly accurate. This could mean that the system clock warps (jumps) by a few seconds or even a few decades after a power cycle when the device does eventually find an accurate time source, perhaps via NTP [Wikipedia-1], when it gains an Internet connection. This can lead to strange hard-to-reproduce bug reports from the field and unhappy users. Note that std::chrono::system_clock
is required to be Coordinated Universal Time (UTC), which does not change due to daylight saving. Time zones and daylight saving are a completely different subject, one that is not well addressed in standard C++ until C++20 [Hinnant18].
How do we avoid this problem? When possible, just use relative timeouts. Use std::condition_variable::wait_for
rather than std::condition_variable::wait_unti
l. An absolute timeout is like setting an alarm clock – if you change the time shown on the clock then you affect how long it will be until the alarm sounds. A relative timeout is like setting an egg timer, and leaving it alone – the time shown on your clock does not affect how long it will be until the alarm sounds.
Unfortunately a relative timeout doesn’t work well for the examples above because the timeout may cover multiple waits. It’s possible to recalculate a relative timeout but that’s easy to get wrong and it risks the timeout being extended unintentionally as small errors accumulate over many loops.
A better solution is to use a monotonic or steady clock that is immune to the warping of the system clock. Such a clock is defined to keep running at an approximately-consistent rate without warping either forwards or backwards. If the machine has access to an accurate clock source, often via NTP, the monotonic clock can be slewed slightly in order to try to keep it running correctly relative to real time. Clock slewing means slowing down or speeding up the clock by small amounts in order to keep time accurately on average over a longer period of time.
On POSIX systems, this clock is known as CLOCK_MONOTONIC
and the current time can be retrieved using the clock_gettime
POSIX function. Unfortunately, the lack of 64-bit types back when this function was invented means that the seconds and nanoseconds are stored separately in a structure. Listing 4 uses a function to tell whether a specified timeout has expired.
#include <time.h> #include <stdbool.h> #include <unistd.h> bool try_to_do_something(void); bool expired(const struct timespec *expire) { struct timespec now; clock_gettime(CLOCK_MONOTONIC, &now); if (now.tv_sec < expire->tv_sec) return false; if (now.tv_sec > expire->tv_sec) return true; return now.tv_nsec > expire->tv_nsec; } bool do_something_for_a_while() { struct timespec expire; clock_gettime(CLOCK_MONOTONIC, &expire); expire.tv_sec += 5; while (!expired(&expire)) { if (try_to_do_something()) return true; } return false; } |
Listing 4 |
This gets more complex if the timeout is not a whole number of seconds because extra housekeeping is required to ensure that the nanoseconds part is kept within permitted bounds. If you find yourself needing to do this then gnulib [GNU] provides helpful functions.
libstdc++ and libc++ use CLOCK_MONOTONIC
to implement C++ std::chrono::steady_clock
, which provides a much easier way to work with absolute timeouts. Using it is just a matter of changing system_clock
to steady_clock
in Listing 2 to get Listing 5 and in Listing 3 to get Listing 6.
#include <chrono> #include <stdexcept> bool try_to_do_something(); void do_something_for_a_while() { using std::chrono::steady_clock; auto const expire = steady_clock::now() + std::chrono::seconds(5); while (steady_clock::now() < expire) { if (try_to_do_something()) return; } throw std::runtime_error("Timed out"); } |
Listing 5 |
#include <chrono> #include <condition_variable> #include <deque> void do_something(std::deque<int> &q); void do_something_for_a_while(std::deque<int> &q, std::mutex &protect_q, std::condition_variable &q_changed) { using std::chrono::steady_clock; std::unique_lock<std::mutex> lock(protect_q); auto const expire = steady_clock::now() + std::chrono::seconds(5); while (steady_clock::now() < expire) { if (q_changed.wait_until(lock, expire, [&q] { return !q.empty(); })) do_something(q); } throw std::runtime_error("Timed out"); } |
Listing 6 |
The compiler’s type checking ensures that you can’t accidentally compare time points from different clock sources against each other, making this much safer than the C version, which must rely on the clock passed to clock_gettime
being consistent.
Both of these examples are now immune to system clock changes.
If you’re stuck using C++98 or C++03 then Boost [Boost] provides boost::chrono
. It also provides a precursor named boost::posix_time
, but that should probably be avoided for new code.
Time points measured against a monotonic clock will usually not be comparable between machines. On Linux, CLOCK_MONOTONIC
is actually the system uptime. In a distributed system, such as video playback synchronised across multiple screens, you may have NTP or PTP [Wikipedia-2] working hard to keep the system clock synchronised across multiple devices. In that case it makes more sense to use std::chrono::system_clock
to agree a specific time to start playback and to control the playback speed. I imagine that a similar situation could occur in other distributed systems.
If we follow the advice above to use relative timeouts where we can and CLOCK_MONOTONIC
or std::chrono::steady_clock
where we can’t, then all will be lovely, right? Well, yes and no. Unfortunately, current versions of GNU libstdc++ and Clang libc++ lack full support for using std::chrono::steady_clock
timeouts for thread-synchronisation primitives and tend to convert silently back to std::chrono::system_clock
, which makes the timeouts subject to misbehaviour when the system clock warps again (although in some cases the window of opportunity can be very small due to the actual wait being a relative one again.) They need to do this because POSIX doesn’t currently provide suitable equivalents of the thread functions that are capable of accepting CLOCK_MONOTONIC
timeouts [Crowe18]. There are new functions [AustinGroup] planned to address this. Glibc v2.30 and later contain these new functions and various patches have already been accepted for libstdc++ to use these functions in GCC 10 to fix the methods on std::condition_variable
, std::timed_mutex
and std::shared_timed_mutex
that accept timeouts. Unfortunately some of the patches [Crowe20] to fix std::future
didn’t make it in before the freeze, but they’ll hopefully be in GCC 11. I believe that similar changes are making their way into Clang libc++ too. If you are stuck using earlier versions then I believe that at least some of these problems are resolved in the Boost equivalents of the standard library functions.
If you follow the advice above, then the situation will be slightly better than if you’d used std::chrono::system_clock
in your code right now and you will automatically get the fixes when your code is compiled with newer standard library versions. Many of the functions involved are inline so the fixes require more than upgrading the shared library.
Summary
- Use relative timeouts to standard library functions when performing a single operation.
- Use
CLOCK_REALTIME
orstd::chrono::system_clock
when your times and timeout relate to time in the real world and you want to react to someone warping the system clock. For example, a calendar or public transport tracking application. - Use
CLOCK_MONOTONIC
orstd::chrono::steady_clock
when your times relate to elapsed time that should not change if someone warps the system clock. For example, network timeouts and refresh intervals. - Use
CLOCK_REALTIME
andstd::chrono::system_clock
when the devices involved are known to have their clocks synchronised and you wish to share timestamps between those devices. - Keep your toolchain up to date (and apply patches if you can) to ensure that you have the latest fixes. If you can’t then look at using Boost instead.
Thanks
Thanks to members of the Austin Group, glibc and libstdc++ maintainers for helping me to turn the scratching of one small itch (in std::condition_variable
) into fixing this class of problems more widely across POSIX and the C++ standard library. Thanks to the ACCU Overload reviewers and Jean-Marc Beaufils for providing feedback.
References
[AustinGroup] Mike Crowe in Austin Group Defect Tracker:https://www.austingroupbugs.net/view.php?id=1216
[Boost] https://www.boost.org
[Crowe18] Mike Crowe ‘The clock used for waiting on a condition variable is a property of the wait, not the condition variable’, at: http://randombitsofuselessinformation.blogspot.com/2018/10/the-clock-used-for-waiting-on-condition.html
[Crowe20] Mike Crowe in GCC Bugzilla: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93542
[GNU] ‘Gnulib – The GNU Portability Library’ at: https://www.gnu.org/software/gnulib/
[Hinnant18] Howard E. Hinnant and Tomasz Kaminski, ‘Extending <chrono> to Calendars and Timezones’ posted 16 March 2018 at: https://howardhinnant.github.io/date/d0355r7.html
[Wikipedia-1] ‘Network Time Protocol’ at: https://en.wikipedia.org/wiki/Network_Time_Protocol
[Wikipedia-2] ‘Precision Time Protocol’: https://en.wikipedia.org/wiki/Precision_Time_Protocol
Mike became a C++ and embedded Linux developer by accident twenty-odd years ago and hasn’t managed to escape yet. Working for small companies means that he gets to work on a wide range of high and low-level software, as well as release processes and build tools to stop him getting bored.
Overload Journal #156 - April 2020 + Design of applications and programs
Browse in : |
All
> Journals
> Overload
> o156
(8)
All > Topics > Design (236) Any of these categories - All of these categories |