Journal Articles

Overload Journal #139 - June 2017 + Programming Topics
Browse in : All > Journals > Overload > o139 (7)
All > Topics > Programming (877)
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: Allocator for (Re)Actors with Optional Kinda-Safety and Relocation

Author: Martin Moene

Date: 06 June 2017 13:28:19 +01:00 or Tue, 06 June 2017 13:28:19 +01:00

Summary: How do you deal with memory for (Re)Actors? Sergey Ignatchenko proposes an allocation scheme.

Body: 

Disclaimer: as usual, the opinions within this article are those of ‘No Bugs’ Hare, and do not necessarily coincide with the opinions of the translators and Overload editors; also, please keep in mind that translation difficulties from Lapine (like those described in [Loganberry04]) might have prevented an exact translation. In addition, the translator and Overload expressly disclaim all responsibility from any action or inaction resulting from reading this article.

What is it about

As it says on the tin, this article is about allocators within the context of (Re)Actors (a.k.a. Reactors, Actors, ad hoc FSMs, Event-Driven Programs, and so on).

The main benefits we get from our C++ allocator (with (Re)Actors and proposed allocation model as a prerequisite), are the following:

Let’s note that while what we’ll be doing allows us to achieve benefits which are comparable to using traditional non-C++ mark-compact garbage collectors, we’re achieving those benefits in a significantly different manner. On the other hand, I don’t want to argue whether what we’re doing really qualifies as ‘automated garbage collection’, or if the name should be different. In the form described in this article, it is not even reference-counted garbage collection (though a similar approach can be applied to allocation models based on std::shared_ptr<> + std::weak_ptr<> – as long as we’re staying within (Re)Actors).

What is important though, is to:

Message-passing is the way to go

Before starting to speak about memory allocation, we need to define what those (Re)Actors we’re about to rely on are about (and why they’re so important).

For a long while, I have been a strong proponent of message-passing mechanisms over mutex-based thread sync for concurrency purposes (starting from [NoBugs10]). Fortunately, I am not alone with such a view; just as one example, the Go language’s concept of “Do not communicate by sharing memory; instead, share memory by communicating” [Go2010] is pretty much the same thing.

However, only after returning from ACCU2017 – and listening to a brilliant talk [Henney17] – I realized that we’re pretty much at the point of no return, and are about to reach a kinda-consensus that

Message-passing is THE way to implement concurrency at app-level

(as opposed to traditional mutex-based thread sync).

The reasons for this choice are numerous – and range from “mutexes and locks are there to prevent concurrency” (as it was pointed out in [Henney17]), to “doing both thread sync and app-level logic at the same time tends to exceed cognitive limits of the human brain” [NoBugs15].

For the time being, it is not clear which of the message passing mechanisms will win (and whether one single mechanism will win at all) – but as I have had very good experiences with (Re)Actors (a.k.a. Actors, Reactors, ad hoc FSMs, and Event-Driven Programs), for the rest of this article I will concentrate on them.

Setting

To be a bit more specific, let’s describe what I understand as (Re)Actors.

Let’s use Generic Reactor as the common denominator for all our (Re)Actors. This Generic Reactor is just an abstract class, and has a pure virtual function react():

  class GenericReactor {
    virtual void react(const Event& ev) = 0;
  }

Let’s name any piece of code which calls GenericReactor’s react() the ‘Infrastructure Code’. Quite often, this call is within the so-called ‘event loop’:

  std::unique_ptr<GenericReactor> r 
         = reactorFactory.createReactor(...);
  while(true) {  			//event loop
    Event ev = get_event(); //from select(), libuv, ...
    r->react(ev);
  }

Let’s note that the get_event() function can obtain events from wherever we want – from select() (which is quite typical for servers) to libraries such as libuv (which is common for clients).

Also let’s note that an event loop, such as the one above, is by far not the only way to call react(): I’ve seen implementations of Infrastructure Code ranging from one running multiple (Re)Actors within the same thread, to another one which deserialized the (Re)Actor from a DB, then called react(), and then serialized the (Re)Actor back to the DB. What’s important, though, is that even if react() can be called from different threads – it MUST be called as if it is one single thread (=‘if necessary, all thread sync should be done OUTSIDE of our (Re)Actor, so react() doesn’t need to bother about thread sync regardless of the Infrastructure Code in use’).

Finally, let’s name any specific derivative from Generic Reactor (which actually implements our react() function), a Specific Reactor:

  class SpecificReactor : public GenericReactor {
    void react(const Event& ev) override;
  };

Also, let’s observe that whenever a (Re)Actor needs to communicate with another (Re)Actor – adhering to the ‘Do not communicate by sharing memory; instead, share memory by communicating’ principle – it merely sends a message, and it is only this message which will be shared between (Re)Actors.

Trivial optimization: single-threaded allocator

Armed with (Re)Actors, we can easily think of a very simple optimization for our allocation techniques. As all the processing within (Re)Actors is single-threaded, we can easily say that:

Ok, let’s write it down that our (Re)Actor allocator is single-threaded – and we’ll rely on this fact for the rest of this article (and everybody who has written a multi-threaded allocator will acknowledge that writing a single-threaded one is a big relief).

However, we’ll go MUCH further than this rather trivial observation.

Allocation model: owning refs, soft refs, naked refs

At this point, we need to note that in C++ (as mentioned, for example, in [Sutter11]), it is impossible to provide compacted heaps “without at least a new pointer type”. Now, let’s see what can be done about it.

Let’s consider how we handle memory allocations within our (Re)Actor. Let’s say that within our (Re)Actor:

With this memory allocation model in mind, I am very comfortable to say that

It is sufficient to represent ANY data structure, both theoretically and practically

The theoretical part can be demonstrated by establishing a way to represent an arbitrary graph with our allocation model. This can be achieved via two steps: (a) first, we can replace all the refs in an arbitrary graph by ‘soft’ refs, and (b) second, there is always a set of refs which make all the nodes in the graph reachable exactly once; by replacing exactly this second set of references with our ‘owning’ refs, we get the original arbitrary graph represented with our ‘owning refs’+‘soft refs’.

As for a practical part – IMO, it is quite telling that I’ve seen a very practical over-a-million-LOC codebase which worked exactly like this, and it worked like a charm too.

BTW,

most of the findings in this article are also applicable to a more-traditional-for-C++11-folks allocation model of ‘shared ptr’+‘weak ptr’

(though for single-threaded access, so atomic requirements don’t apply; also, we’ll still need to avoid ‘naked’ pointers within the state of our (Re)Actor). However, it is a bit simpler to tell the story from the point of view of ‘owning’ refs +‘soft’ refs, so for the time being we’ll stick to the memory allocation model discussed above.

An all-important observation

Now, based on our memory allocation model, we’re able to make an all-important

Observation 1. Whenever our program counter is within the Infrastructure Code but is outside of react(), there are no ‘naked pointers’ to (Re)Actor’s heap.

This observation directly follows from a prohibition on having ‘naked pointers’ within (Re)Actor’s state: when we’re outside of react(), there are no ‘naked pointers’ (pointing to the heap of our (Re)Actor) on the stack; and as there are no non-const globals, and there are ‘naked pointers’ within the heap itself either – well, we’re fine.

Modes of operation

Now, let’s see what how we can implement these ‘owning refs’ and ‘soft refs’. Actually, the beauty of our memory model is that it describes WHAT we’re doing, but doesn’t prescribe HOW it should be implemented. This leads us to several possible implementations (or ‘modes of operation’) for ‘owning refs’/‘soft refs’. Let’s consider some of these modes.

‘Fast’ mode

In ‘Fast’ mode, ‘owning refs/pointers’ are more or less std::unique_ptr<>s – and ‘soft refs/pointers’ are implemented as simple ‘naked pointers’.

With this ‘fast’ mode, we get the best possible speed, but we don’t have any safety or reallocation goodies. Still, it might be perfectly viable for some production deployments where speed is paramount (and crashes are already kinda-ruled out by thorough testing, running new in production in ‘safe’ mode for a while, etc. etc.).

‘kinda-Safe’ mode

In a ‘kinda-Safe’ mode, we’ll be dealing with ‘dangling pointers’; the idea is to make sure that ‘dangling pointers’ (if there are any) don’t cause memory corruption but cause an exception instead.

First of all, let’s note though that because of the semantics of ‘owning pointers’, they cannot be ‘dangling’, so we need to handle only ‘soft’ and ‘naked’ pointers, and references.

‘Dangling’ soft references/pointers

To deal with ‘dangling’ soft-pointers/references, we could go the way of double-reference-counting (similar to the one done by std::weak_ref<> – which actually uses the ages-old concept of tombstones), but we can do something better (and BTW, the same technique might be usable to implement std::weak_ref<> too – though admittedly generalizing our technique to multi-threaded environment is going to be non-trivial).

Our idea will be to:

NB: of course, it IS still possible to use double-ref-counting/tombstones to implement ‘kinda-Safe mode’ – but at this time, I prefer an ID-based implementation as it doesn’t require an extra indirection (and such indirections, potentially costing as much as 150 cycles, can hurt performance pretty badly). OTOH, if it happens that for some of the real-world projects tombstones work better, it is always still possible to implement ‘kinda-Safe mode’ via a traditional tombstone-based approach.

‘Dangling’ naked references/pointers

With naked references/pointers – well, strictly speaking, we cannot provide strict guarantees on their safety (that’s why the mode is ‘kinda-Safe’, and not ‘really-Safe’). However, quite a few measures are still possible to both detect such accesses in debugging, and to mitigate the impact if it happens in production:

‘Safe with relocation’ mode

In a ‘Safe with relocation’ mode, in addition to dealing with ‘dangling’ soft refs, we’ll be allowing to relocate our allocated objects. This will allow us to eliminate dreaded ‘external fragmentation’ – which tends to cause quite a bit of trouble for long-running systems – with lots of CPU pages having a single object in them being allocated some memory (which in turn, if we cannot possibly relocate those single objects, tends to cause lots of memory waste).

To implement relocation, in addition to the trickery discussed for ‘Safe’ mode, we’ll be doing the following:

Bingo! We’ve got (kinda-)safe implementation – and with the ability to compact our heap too, if we wish.

Traversing SpecificReactor state

In spite of all our efforts discussed above, in certain cases, there might be situations when the size of our ‘page map’ and especially ‘relocation map’ will grow too large. While I expect such situations to be extremely rare, it is still nice to know that there is a way to handle them.

If we say that for every object within our class SpecificReactor, there is a traverse() function (with traverse() at each level doing nothing but calling traverse() for each of child objects) then after calling traverse() for the whole SpecificReactor, we can be sure that all the pointers have been dereferenced, and therefore were fixed if applicable; as a result – after such a traverse() – our ‘relocation map’ is no longer necessary and can be cleaned (BTW, if we’re doing traverse() frequently enough, we may avoid storing the reference count, which was mentioned above in the context of cleaning up the ‘relocation map’).

Moreover, after such a call to SpecificReactor::traverse(), we can be sure that there are no more pointers to decommitted pages, which means that ‘page map’ can be cleaned too.

On the one hand, let’s note that for (Re)Actors with a large state, traversing the whole state may take a while (especially if the state is large enough to spill out of the CPU caches) – which may be undesirable for latency-critical apps. On the other hand, in such cases it is usually possible to implement traversing in an incremental manner (relying on the observation that any newly created objects are not a problem) – but all methods I know for such incremental traversals require us to be very careful about object moves (from a not-traversed-yet into a supposedly-already-traversed area) and about invalidating collection iterators. Still, it is usually possible and fairly easy to write such an incremental traversal – albeit an ad hoc one (i.e. taking the specifics of the app into account).

Further discussion planned

Actually, this is not the end of discussion about (Re)Actors and their allocators. In particular, I hope to discuss how to use such allocators to implement (Re)Actor serialization (and as mentioned in [NoBugs17], serialization of the (Re)Actor state is necessary to achieve quite a few (Re)Actor goodies, including such big things as Replay-Based Regression Testing and production post-factum debugging).

Cartoon by Sergey Gordeev from Gordeev Animation Graphics, Prague

References

[Go2010] ‘Share Memory By Communicating’, The Go Blog, https://blog.golang.org/share-memory-by-communicating

[Henney17] Kevlin Henney, ACCU2017, ‘Thinking Outside the Synchronisation Quadrant’

[Loganberry04] David ‘Loganberry’, Frithaes! – an Introduction to Colloquial Lapine!, http://bitsnbobstones.watershipdown.org/lapine/overview.html

[NoBugs10] ‘No Bugs’ Hare, ‘Single Threading: Back to the Future?’, Overload #97–98, June–Aug 2010

[NoBugs15] ‘No Bugs’ Hare, ‘Multi-threading at Business-logic Level is Considered Harmful’, Overload #128, Aug 2015

[NoBugs17] ‘No Bugs’ Hare, ‘Deterministic Components for Interactive Distributed Systems’, ACCU2017, http://ithare.com/deterministic-components-for-interactive-distributed-systems-with-transcript/

[Sutter11] Herb Sutter, ‘Garbage Collection Synopsis, and C++’. https://herbsutter.com/2011/10/25/garbage-collection-synopsis-and-c/

Notes: 

More fields may be available via dynamicdata ..