Programming Topics + CVu Journal Vol 30, #6 - January 2019

Browse in :

All > Topics > Programming
All > Journals > CVu > 306
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: C++ Tagged Reference Types

Author: Bob Schmidt

Date: 09 January 2019 16:24:47 +00:00 or Wed, 09 January 2019 16:24:47 +00:00

Summary: Pete Cordell proposes an extension to C++ move syntax.

Body:

Iâ€™m a big fan of C++ move semantics. They are a significant development that not only improves C++, but also moves computer programming language concepts forward. But a question raised recently on the ACCU General mailing list makes me wonder if they are still a â€˜work in progressâ€™.

The question asked was, when is it best to use l-value refs, r-value refs or pass-by-value to pass non-trivial parameters to functions? Even amongst the experts of ACCU, there didnâ€™t seem much consensus. And there seemed no resolution on how to pass potentially moveable types through a derived class constructor on to a base class constructor.

To me, if there is no clear consensus among ACCU members on the best way to do this, then it seems too complicated for the average C++ programmer like myself. This is a bad situation for C++. This led me to idly musing whether something better could be achieved.

A key observation is that passing parameters as l-value refs, r-value refs or values is essentially an optimisation problem, and conventional wisdom is to let the compiler do low level optimisation. Therefore, would it be possible for the compiler to handle this situation too?

The core of the problem is that a function (or class) wants to end up with its own copy of a value, such as a string or more complex data structure, passed in as a parameter. If the input parameter is an l-value ref, then the function has no choice but to do a copy operation. But if the parameter is an r-value ref, then the most efficient operation is typically a move.

Outside of templates, a programmer would have to implement two functions to cater for both these situations. Both of which are likely to have almost identical code. This is not desirable.

I could wish for a syntax that allowed a function to declare that â€˜this parameter can be either a l-value ref or an r-value refâ€™ and the compiler would be responsible for generating functions for both the l-value ref and an r-value ref variants. So, using a triple ampersand (&&&) to denote such a function parameter, given a function definition of:

  int func( Foo const &&& f ) {...}

the compiler would generate code for:

  int func( Foo const & f ) {...}
  int func( Foo && f ) {...}

where the ... would be identical C++ code in both cases.

This is certainly an option, but it doesnâ€™t readily cater for the more general case where a function needs to take a number of such parameters, such as:

  int func( Foo const &&& f, Bar const &&& b,
    Dee const &&& d ) {...}

A definition like this would require 8 functions to be auto written. Feasible, but not ideal.

An alternative in such as case is to move away from having a compile-time solution and adopt a run-time one instead. This would require the generated reference to include whether it is an l-value ref or an r-value ref. In C++ terms it might look something like:

  template< class T >
  struct tagged_ref {
    T & ref;
    enum { lvalue, rvalue } form;
  }

But rather than be an STL type, it would be built into the compiler, and the compiler would be able to optimise its format and usage as necessary. For example, if found to be more efficient, the reference part could be passed to a function in a register and the â€˜formâ€™ part on the stack. (Or even, for simple functions that donâ€™t have multiple parameters, the compiler could use the approach of auto-generating both l-value and r-value reference forms of the function, as previously mentioned.)

Iâ€™ll call the type a â€˜tagged referenceâ€™.

Converting a tagged reference to a const l-value reference would be trivial. This could happen when using the tagged reference in an expression or passing it to a function parameter that has an l-value reference signature.

Assigning a tagged reference to another variable becomes more interesting. The generated code would have to look at the tagged reference form and decide whether a copy or move should be done. While the program code might look like:

  void func( Foo const &&& f ) {
    a = f;
  }

The code generated by the compiler might look more like:

  void func( Foo const &&& f ) {
    if( f.form == tagged_type<Foo>::lvalue )
      a = f;
    else
      a = std::move( f );
  }

Note that once the tagged reference has been assigned to another variable, its value may have been destroyed by a move operation. To avoid problems, the tagged reference variable shouldnâ€™t be used again after that point. It needs to effectively go out of scope:

  void func( Foo const &&& f ) {
    a = f; // Fine, but could be a copy or a move
    b = f; // Error: f is no longer in scope.
  }        // Use b = a

The same applies to passing it to another function as a parameter that is a tagged reference or r-value ref. (For this reason, assigning a tagged reference to another variable, or passing it to a tagged reference or r-value ref function parameter canâ€™t be done inside a loop. Another little thing for a compiler to look out for!)

Other than that, compiler support for a tagged reference type looks like a reasonably simple feature to support. Programmers would still be able to use the separate forms if they needed utmost efficiency.

In conclusion, tagged reference types could significantly simplify the burden on programmers of making the most of move semantics in a consistent, correct and efficient way. The compiler would automagically by able to â€˜do the right thingâ€™. Source code size would be reduced by de-duplication, easier to understand and gain all the other benefits of DRY code. All of which is in line with many of the other improvements made to C++ in recent years. Move semantics could then become â€˜magic move semanticsâ€™.

In the next article: The const? keyword, wherein the compiler generates two versions of a function, one with the const? keyword replaced with const and the other with the empty string.

Pete Cordell Pete started with V = IR many decades ago and has been slowly working up the stack ever since. Pete runs his own company, selling tools to make using XML in C++ easier.

Notes:

More fields may be available via dynamicdata ..