Exported templates are one of the two standard template compilation models. Exported template implementations have only recently become available. During the time they where defined but unavailable, they were the subject of much expectation, some of it unreasonable and some of it the consequence of confounding compilation models and instantiation mechanisms.
This article reviews the compilation models and instantiation mechanisms defined by the C++ standard, and then looks at some related issues with C++ templates and examines what we can expect from the export keyword.
According to [Vandevoorde-],
The compilation model determines the meaning of a template at various stages of the translation of a program. In particular, it determines what the various constructs in a template mean when it is instantiated. Name lookup is an essential ingredient of the compilation model.
Name lookup is an essential ingredient of the compilation model, but the standard models share the same name lookup rules, which is called two phase lookup, so it is sufficient for this article to state that names independent of the template parameters are searched in the context of the definition of the template (that means that only names visible from the definition are found) while names dependent on the template parameters are searched in both the definition and instantiation contexts (that means that names visible from the place where the instantiation is used may also be found). [Vandevoorde-] provides a more complete description, including a more precise definition of what the definition and instantiation contexts are.
Note that while these name lookup rules were introduced in the draft standard in 1993, until 1997 all compilers looked up both dependent and independent names only in the instantiation context; the first compilers to correctly implement the name lookup rules found so many errors in programs that they had to output warnings instead of errors.
At the heart of this article is another ingredient of the template compilation model: how the definitions of non class templates are found. At first, this part of the compilation model was not clearly specified. For example, Bjarne Stroustrup [Stroustrup] wrote:
When a function definition is needed for a class template member function for a particular type, it is the implementation's job to find the template for the member function and generate the appropriate version. An implementation may require the programmer to help find template source by following some convention.
CFront, which was the first implementation of C++ templates, used some conventions which are described in the appendix. However, the standard provides two ways, both of which differ from the CFront approach.
This is the only commonly provided compilation model: the definition of a template has to be provided in the compilation unit where the template is instantiated[1].
In an effort to be able to compile source code written for CFront, some compilers provide a variant where the source file which would be used by CFront is automatically included when needed.
When using this model, template declarations have to be signalled as exported[2] (using the export keyword). A definition of the template has to be compiled in one (and only one, by the one definition rule) compilation unit and the implementation has to manage with that.
It should be noted that while in the inclusion model the two contexts where names are looked up are usually not very different (but remember the problems found by the first implementation of two phase name lookup rules), in this model the differences may be far more important and give birth to some surprising consequences, especially in combination with overloading and implicit conversions.
According to [Vandevoorde-][3],
The instantiation mechanisms are the external mechanisms that allow C++ implementations to create instantiations correctly. These mechanisms may be constrained by the requirements of the linker and other software building tools.
One may consider there to be two kinds of instantiation mechanisms:
-
local, where all the instantiations are done when considering each compilation unit, and
-
global, where the instantiations are done when considering all the compilation units in the program or library.
CFront used a global mechanism: it tried a link and used the error messages describing missing symbols to deduce the necessary template instantiations. It then generated them and retried the link until all required instantiations where found.
Borland's compiler introduced the local mechanism: it added all the instantiations to every object file and relied on the linker to remove any duplicates.
Sun's compiler also uses a local mechanism. It also generates all the needed instantiations when compiling a compilation unit, but instead of putting them in the object file, it puts them in a repository. The linker does not need to be able to remove duplicates and an obvious optimisation is generating an instantiation only if it is not already present in the repository.
Comeau's and HP's compilers have a global mechanism: they use a pre-linker to detect the needed instantiations[4]. These are then assigned to compilation units which can generate them and these compilation units are recompiled. The assignment is cached so that when recompiling a compilation unit to which instantiations have been assigned, they are also regenerated; the pre-linking phase is then usually simply a check that all the needed instantiations are provided, except when starting a compilation from a clean state.
Comeau's compiler has an additional mode where several objects are generated from a compilation unit to which instantiations have been assigned. This removes the need to link a compilation unit (and other compilation units upon which it depends) only because an instantiation has been assigned to it.
It should be noted that the compilation model and the instantiation mechanisms are mostly independent: it is possible (though not always convenient or especially useful) to implement the standard compilation models with each of the instantiation mechanisms described. A consequence is that one should not expect the separation compilation model to solve problems related to instantiation mechanisms.
The need in the inclusion model to provide the source code of the template definition is seen by some as a problem. Is the separation model a solution?
First, it should be noted that the standard formally ignores such issues and so a compiler could always demand that the source code be present until link time. Not going to such extremes, some compilers delay code generation until link time and so generate high level object files[5].
It should also be noted that a compiler could provide a way to accept encrypted source or high level intermediate format (something very similar to what is done with precompiled headers) and so if there is enough demand, the compiler makers can provide a solution (not perfect but probably good enough for most purposes: it is used in other languages; the main problem would probably be to generate useful error messages when encrypted code is all what is available) even with the inclusion model.
These remarks made, we'll consider the related but quite different question: can the separation model be implemented in such a way that only low level information is needed to instantiate templates?
The two phase lookup rule and other modifications made during the standardisation process allowed compilers to check the template definition for syntactic errors, but most semantic ones can only be detected at instantiation time. Indeed, most operations done in a template depend on the template parameters, and so the parameters must be known to get the precise meaning.
So, obviously the answer is no: the separation model may not prevent the need to furnish the template definition as a high level description.
Templates are often blamed for long compilation times. Is this attribution correct?
Concerning the compilation model, in the inclusion model, every compilation unit using a template has to include the definition of the template and so everything needed for the definition. So more code has to be read and parsed than for the export model, but some techniques such as precompiled headers can reduce the overhead.
The separation model does not have obvious overhead in forcing redundant work to be done, even if the current implementations force a reparsing of the definition for each instantiation.
The main overhead of the global instantiation mechanisms is in the way the required instantiations are detected. CFront's way of trying links until closure was costly. More modern methods such as those of HP and Comeau are less costly. But, they still have the disadvantage of increasing the link time of a clean build. The global mechanisms have also an overhead in the recompilation of the compilation units to which instantiations have been assigned. Although, this overhead exists only when doing a clean build.
There is a serious overhead in the local mechanism without using a repository: the instantiations are compiled several times, and the optimisation and code generation phases of a compiler usually do take a significant part of the process. Doing so only to throw the result away is a waste, and bigger files are created and it complicates and slows down the linker.
In this section, we'll examine what recompilations are needed when a file is modified, and if the recompilation can be done automatically.
When modifying a type used as a template argument, all the files using this type should be recompiled and the compilation model has no influence on that.
The normal use of makefiles[6] triggers the recompilation in all combinations of compilation models and instantiation mechanisms.
When modifying a template definition, things are sensibly different.
With the inclusion model, the normal use of makefiles triggers a recompilation of all compilation units including the definition of the template and so the needed instantiation will be recompiled whatever the compilation model is used.
With the separation model, the normal use of makefiles will trigger a recompilation of the compilation unit providing the exported definition and trigger a relinking. Is this enough?
When using a local mechanism, all compilation units using the template should be recompiled, so additional dependencies should be added to the makefile. In practice, a tool aware of exported templates should be used to generate the makefile dependencies.
When using a global mechanism, the pre-link phase should be able to trigger the needed recompilation: it only needs to be able to detect that the instantiations are out of date; being able to launch recompilations is inherent to this mechanism. Exported templates provide a natural way to trigger the pre-link phase and to allow it to check the consistency of the objects.
When a definition becomes available when it was previously not, the used instantiations need to be provided. That can be considered as a modification of a stub definition and the needed recompilations would be the same.
Compared to the inclusion model, what are the expected effects of using the separation model?
-
It removes the need to provide the definition of the function template along with the declaration. This mimics what is true for normal functions and a behaviour expected by most people starting to use templates.
-
It also removes the need to include all the declarations needed by the definition of the function templates, preventing a "pollution" of the user code.
The other effects are dependent on the instantiation mechanism used.
-
in conjunction with a local mechanism, without duplicate instantiation avoidance (like Borland's), it could need more parsing than the inclusion model as the headers needed for both the definition and the declarations have to be parsed twice if one requires the definition to be available at instantiation time (as does the only implementation).
-
in conjunction with a local mechanism with duplicate instantiation avoidance (like Sun's), it could reduce the needed file reading and parsing, but the disadvantage for the inclusion model may be reduced by using techniques such as precompiled headers
-
in conjunction with a global mechanism
-
it reduces file reading and parsing, but the disadvantage for the inclusion model may be reduced by using techniques such as precompiled headers.
-
it reduces the need for recompilations after a change in the template as only the compilation unit providing the instantiation has to be provided.
-
I have performed some experiments with exported templates using Comeau's compiler, to check if they are usable. I wanted to see if it was possible to set up a makefile so that all needed recompilations were triggered automatically without adding dependencies manually, to see if it was possible to use them with libraries, and to see if it was possible to organize the code so that it could be compiled in both the inclusion model and separation one.
I also wanted to check if the expected effects on the instantiation mechanisms described above where measurable. As Comeau's compiler provides a global mechanism, I expected a reduction in compile time, and a reduction in file reading and parsing, and I wanted to see how it compared with what could be obtained with using pre-compiled headers.
Obviously such effects depend on the code. The simple setup I used was designed to be favourable to export: a project made of a simple template function making use of the standard IOStream implementation, but not its interface, was instantiated for the same argument in ten files containing very little else. In such a setup if export did not provide a speed up in compilation time, there is little hope that it will in real life projects.
I measured
-
the time to build from scratch
-
the time to rebuild after touching the template definition file
-
the time to rebuild after touching the header defining the template argument type
For each kind of compilation[7]:
-
normal compilation
-
using precompiled headers
-
using export
The results are seen in this table:
Table 1.
Normal build | Precompiled Headers | Exported template | |
From scratch | 10.2 | 5.2 | 3.7 |
Touching the type definition | 9.4 | 4.7 | 2.5 |
Touching the template definition | 9.3 | 4.7 | 2 |
One can see that, at least for this kind of use, exported templates have some benefit in build time. This is especially true when modifying the template definition (which for exported templates resulted in one file compilation and a link, while there where several file compilations for both the normal build and when using precompiled headers), but the effect of parsing reduction can be seen when the same instantiation is used in several files and when the use of export reduces the need for include (in the experiment: the <iostream> and <ostream> headers were only needed in the template definition).
Obviously, in more realistic scenarios, the proportion of the timing reduction would be different and using export could result in degradation of building time when template instantiation was used in only one compilation unit or when the usage of export does not reduce the need to include files.
CFront compilation model and instantiation mechanism[8]
When instantiating templates, CFront compiled (in a special mode to ensure that only template instantiations were provided) a new compilation unit made up of
-
the file containing the template declaration,
-
a file expected to contain the template definition whose name was made up by changing the extension of the file containing the declaration,
-
a selection of files included in the file which requested the template instantiation,
-
special code triggering the wanted instantiations.
A name (dependent or independent) used in a template, was searched in the context of instantiation in this compilation unit, this context was different but usually very similar to the context at the true instantiation point.
CFront compiled template instantiations at link time. A prelinker launched a link, deduced the required instantiations from the missing symbols and generated them if they where not already present in a repository. Then it restarted the process until all needed instantiations where available. The behaviour of CFront was reputed to be slow (linking takes a lot of time and doing several of them takes even more so) and fragile (needed recompilation of instantiations sometimes did not occur and so the first step in handling a strange error was to clean the repository and recompile everything).
[Vandevoorde-] David Vandevoorde and Nicolai M. Josuttis, C++ Templates, The Complete Guide, Addison-Wesley, 2003
[Sutter.pt1] Herb Sutter, "export" restrictions, part 1, C/C++ Users Journal, September 2002, also available at http://www.gotw.ca/publications/mill23.htm.
[Sutter.pt2] Herb Sutter, "export" restrictions, part 2, C/C++ Users Journal, November 2002, also available at http://www.gotw.ca/publications/mill24.htm
[1] While the standard requires that the definition is either available in every compilation unit where the template is instantiated or exported, no compilers I know of check this rule, they all simply don't instantiate a template in a compilation unit where the definition is not present, and some take advantage of this behaviour.
[2] The standard seems to imply that only the definition has to be marked as exported, but the only implementation demands that the declaration is marked and a defect report (the mechanism to report and correct bugs in the standard) on this issue has been introduced to demand it.
[3] The classification of instantiation mechanisms used in this book is different to the one presented here. They use greedy instantiation and queried instantiation for what we call local instantiation and use iterated instantiation class for global instantiation.
[4] Unlike CFront, they do not detect them by examining the missing symbols from a failed link, but use a more efficient mechanism.
[5] I'll consider an intermediate format to be high level if the source code without the comments can be reconstructed; an intermediate format which is not high level will be qualified as low level. Obviously in practice the separation between low level and high level is not clear.
[6] That is where dependencies are generated by the preprocessor (with an option like -MM for gcc) or an external tool (like makedepend)
[7] I also tried to measure it for the combination of export and precompiled headers, but it triggered a bug in Comeau's compiler.
[8] I've never used CFront, so this description is not from a first hand experience but is the summary of information found at different places.
Overload Journal #54 - Apr 2003 + Design of applications and programs
Browse in : |
All
> Journals
> Overload
> 54
(10)
All > Topics > Design (236) Any of these categories - All of these categories |