Title: Linkers

Author:

Date: 03 June 2001 13:15:45 +01:00 or Sun, 03 June 2001 13:15:45 +01:00

Summary:

Body:

I was asked the other day by someone looking at the "build" menu in his new compiler what the "link" option was for. And having answered him, and thought further about it overnight, I thought that given the integrated program development workbenches favoured by the compiler vendors these days, perhaps it is generally unclear what a linker does. So I wrote these notes in case it is not clear to you either.

Briefly, the linker sorts out the addresses of all the variables and functions in your program, making sure they are in the correct location in memory for your hardware and operating system. It then places those addresses in the address fields of all the instructions that use them.

Do We Need a Linker?

So why do we need a linker? After all, is not all the necessary information available to the compiler, so can it not do this job?

And the answer is "yes": we can manage without a linker. In the early days of computing they did just that and programs were laboriously hand assembled to run at a fixed address. Subsequently, people have quite successfully written compilers which do not need linkers. But to write a program development system these days without supplying a linker would prove vastly inconvenient, if not impossible, for all but the simplest of programs. Which is about the level of sophistication achieved in the early days of computing, or by those using linker-less compilers.

Specifically, without a linker, you would lose access to modular program development, of which the run-time library is one example. You could circumvent this by compiling everything every time you change your program, but that would result in long program re-compilation times. It would also result in large program images because you would have to incorporate the entire run-time library, however little of it you actually used. Another consequence is that package suppliers would have to sell you their program in source code form, rather than the more tamper-proof, and technique hiding, relocatable binary form they actually use.

There is also the question of locating your code where the operating system wants it in memory. If you buy more main memory (or increase the virtual memory space), you might need to change this location. The linker will do this for you without requiring a total recompilation of your program.

So all in all, putting an extra stage in the path to building your new program will make life a lot easier for you in the longer term.

What a Linker Does

Having made the case for the linker, let me talk in more detail about what the linker does.

There are two main functions for a linker. The first is to place all the code and data in memory at addresses compatible with the operating system and the memory physically available on the machine. As it is only at this stage that the true addresses of all the names in the program are known, the linker must ensure that all the machine instructions that use names (the jumps, the function calls, the variable loads, etc) use the resulting addresses.

The linker will generally provide some ancillary functions, such as loading pre-compiled modules only if they contain things of interest to the program being built. That is, it will merge library files into the image under construction, but only if they add something needed by the image.

The linker will usually generate global symbol tables, of use to the debugger should that become necessary.

And the linker may write a preamble to the memory image which tells the operating system exactly where to place the program in memory, and how to start running it.

What the Compiler Does

To go further into what the linker does, I have to talk a little about how computers work, and what compilers and assemblers do. You know, of course, that if you write a C statement like

if (var > 12) func ();

(where everything is well defined), the compiler (notionally at least) will generate a text file containing assembler which might look like:

cmpw 12,var(pc)
ble label_1
bsr func
label_1:

Do not worry which processor this is for, the point is that not only has the compiler ignored where var and func will actually be located, but also it has invented a new symbol, label_1 so that it can generate code for the if statement. The compiler then hopes that something else will sort out where all these names are later.

The assembler is then run on the output from the compiler, and produces something like:

c5a6000c           cmpw   12,var(pc)
????
7a8006             ble    label_1
02????             bsr    func
              label_1:

This is the assembler listing from the example above, somewhat edited to fit the confines of this page, but is generally typical of such listings. On the right, the original source code is reproduced, with the various parts realigned in columns to comply with the assembler's idea of a good layout style (i.e., the assembler has done some 'pretty printing' for me). On the left, the compiler has listed, in hexadecimal, the machine code it generated for each instruction.

Relocatable Binary Files

I say 'something like' because the principal output of the assembler is not the listing, but the machine code resulting from its translation of the original assembly code. This the assembler will write directly to a binary file.

I also say 'something like' because in the example, you will see a lot of question marks, '?'. These the assembler has inserted into the listing where it knows it must generate a byte, but cannot deduce what the value of that byte is. These occur on the second line, where it is trying to load the address of the variable var, and on the fourth line, where it needs the address of func for the procedure call.

The assembler cannot write question marks to the binary file, but what it can do is append control information saying that when somebody finds out the addresses of var and func here is where they should be inserted. And that 'somebody' is the linker.

Note that the assembler was able to resolve one of the name, or symbol, references on its own in this example. The assembler found a definition of the symbol label_1. Whilst it does not know exactly where in memory this symbol will be placed, and so cannot work out its address, it does know that the ble instruction which uses it expects to be given the offset of, or distance to, label_1, not its address.

The assembler does not know the address of the ble instruction either, but it can count the number of bytes between that instruction and label_1 to find the offset. This will not change wherever the code is located in memory, so the assembler was able to calculate that offset and write it directly to the binary file.

To complement all the above, the assembler must also write information into the binary file giving the addresses, as best it can, of all the symbols defined in the file it has just assembled. Other modules will need this should they want to use any of the functions in the file.

The resulting binary file might be best termed a 'locatable binary file', because it contains all the information necessary to complete the assembly process by locating the resulting code anywhere in memory. However, common usage has ended up calling it a 'relocatable binary file', and so that is what it is. (I suspect 'relocatable' has come about because if you look at an assembler listing carefully, you will find the assembler has assumed that the code is located at address zero, and the binary file contains sufficient information for the linker to relocate the code somewhere else.)

Compiler Generated Binary Files

I have explained the above in terms of a compiler generating a text file that must subsequently be processed by an assembler to obtain the relocatable binary file. But most compilers for a very long time have generated the relocatable binary file directly. The reason is simply speed and efficiency.

There is little difference between writing the instruction mnemonic to a text file and the machine code equivalent to a binary file. True the compiler also has to insert the address information if it proceeds along the latter course, but that does not take much effort. The only reason for generating assembler mnemonics is if the compiler author does not know the relocatable binary file format for the target machine, in which case he hopes that the user will have an assembler that does.

One consequence is that whilst a compiler generates binary, it usually has an option for showing the equivalent assembler. And this in turn means that no matter how it does this, there is scope for the compiler claiming to have generated one instruction whilst it has actually generated another. But I think this will cause the author of the compiler's code generator more headaches than you.

The Linker in More Detail

I can now return to what the linker does.

The linker firstly asks where in memory the code is to be placed. This is because most systems reserve some of the address space for themselves and information about your process. In a single-user system, the system code itself will be present at these reserved locations (and woe betide he who overwrites any of them). In a multi-user system, the system may well have other means of retaining user (or process) dependent information, so perhaps your code can be placed anywhere in memory.

If the linker does not explicitly ask for the memory locations to use for your program, it will be using default values. But there should be a knob in there somewhere to adjust those values.

The linker then copies the contents of the relocatable binary files it has been given into memory at the chosen location. Actually, it probably copies the binary files into a temporary file that it assumes to be located in memory at the chosen location. This way it can link your program at the addresses you want, even if there is something else present at those addresses at the moment, or you have not even bought that memory. The former is necessary when re-linking the operating system, the latter convenient should you be about to purchase a memory up-grade.

Many computers work faster if variables and instructions are aligned on particular address. For instance, the Motorola 68000 series requires 2-byte (i.e., 16-bit) integers to be aligned on an even address. Intel processors do not require this, but if you do not do it you will require twice as many memory cycles to access an integer, so your program will run more slowly.

Whilst the compiler can generate code to assist in this objective, at the end of the day it is the linker that must make sure that this happens.

As it does all of this, the linker will extract the symbol definitions from the binary files it reads, and adjust them in its own tables according to where it has actually placed the code in memory.

All the code having been loaded, and assuming there to be no references to undefined symbols, the linker will pass through the memory image, patching, or fixing-up, all the address fields with their true values.

And after that, all the linker has to do is to close its temporary file and call it an executable memory image. To run the file, something in the operating system called the loader opens the executable file, copies its contents into memory and tells the operating system to pass control to it.

And that is what a simple linker does, and why.

Library Files

Most linkers offer more functions, as I have hinted at above. The first is library files.

Most programmers rely heavily on pre-compiled software from other sources. This may be a module they themselves wrote earlier, a bought-in graphics library or their compiler's run-time library.

Now on being told to use them, the linker could copy the entire contents of the specified library files into the executable image it is generating. But many libraries contain far more functions than any given program will actually use, or most programmers will ever use. So for the linker to merge everything into an executable image would result in an image far larger than was really necessary.

To avoid this, the linker can usually be told to merge in a relocatable binary file using library mode. That is, to take from the file only those bits of code necessary to define symbols which have not yet been defined. And conversely, when code is found in the library that defines a symbol which is not used, save space by not loading that code.

Now this tends to get a little incestuous, as one library function is often written in terms of another. (In C, I might find it convenient to write gets in terms of fgets and strchr.)

So some linkers, on being told to search a binary file in library mode, will make multiple passes through the file until they complete a pass without taking any more from it. Others make a single pass through the binary file. If there are unsatisfied external references after the pass has been completed, you merely search the library again. Here, you might use programs like lorder and tsort under 'Unix' to sort the various modules so as to minimise the number of passes needed to extract everything necessary from the library.

And what is a library file? Well, it may be no more than an ordinary relocatable binary file, though probably longer than usual due to the number of functions it contains.

Code Placement

Another important function of the linker is code placement. I said above that the linker will place your code where you tell it, within the constraints of the operating system's expectations.

There are two aspects of this. The first is that many computers, perhaps not the Pentium in your PC, but certainly the microcontroller in your TV that remembers all the channel settings, will have several sorts of memory. These will be selected from the basic types of ROM, RAM and NVRAM. There are other sorts, such as PROMs and Flash, but the first three are sufficient for this article.

ROM, Read Only Memory, can only be read, giving what ever the manufacturer programmed into it, but does not need power to remember what it was programmed with. RAM, Random Access Memory, can be both written to and read from, but remembers things only whilst power is applied. NVRAM, Non-Volatile Random Access Memory, can be read, can remember things without power being applied, but may only have a life of 100,000 write cycles (but many tens of millions read cycles). (So NVRAM is only suitable for variables that are rarely changed, like the channel settings of a TV set.)

The second aspect is that most of the larger microprocessors, and their operating systems, talk about memory segments, and the access rights, or permissions, associated with them. What they mean is that a typical, well behaved, program has instructions, which are only ever executed; constant data, which is only ever read; and variable data, which is both read and written. An attempt to execute data, or write to an instruction, is clearly wrong, and results in the operating system terminating the program before it can do any serious damage.

These two aspects are to some extent complimentary. Both instructions and constant data can be placed in ROM as neither should be written to. And variables may be placed in either RAM or NVRAM, according to whether they must be retained after the power has been turned off.

The linker has to place the code according to the constraints of these two aspects on the target system. This is not actually too difficult, though it does rely on clues being embedded in the relocatable binary file to say which particular type of segment or memory type the code should be placed in for best effect.

When generating code, a compiler knows which are the instructions it has created as a result of translating the original C statements, and which are the storage reservations resulting from the variables the user declared. But knowing which variables are actually constant data is more difficult, to the point that many compilers will place all variables in RAM just in case. In C, the 'const' keyword will help to some extent, but only on the declaration of a static variable with an initializer, not an argument.

An assembler will solve the problem by having pseudo-operations saying place the following code in this type of memory, and leave it to the hapless programmer to realise that the reason he cannot change the value of a variable is because he has accidentally placed it in a ROM segment.

Overlays

Associated with code placement is overlay generation. This is not so often come across these days, as the need for it has been largely supplanted by the availability of very large memory chips, memory management units and virtual memory systems. However, in the 1970's, when 64 KBytes of memory was all you got (if that) using the linker to generate overlays was a way of getting larger programs into the limited available memory. Indeed, automatic overlay generation was a feature of the then COBOL standard, though it can be applied to any programming language.

The basic idea is that if you look at the call graph of most programs (i.e., draw a diagram of who calls what), you will find that you have the main program at the top, the run time library at the bottom, and a number of columns of procedure calls between the two. If you are lucky, or have designed your program so, there will be few, if any, lines indicating a function in one column wants to use a function in another.

An example pertinent to this example is the programmer's workbench. The main program will contain the window handler and allow selection of the menu options, whilst the columns would be the main functions of the workbench of editor, compiler, assembler, linker and debugger. You will only be using one of those five functions at any given time, so the others can be removed to make space for it.

If your program is too big to fit in memory, it is clear that the functions to remove are those that are not in the column containing the function currently being executed. Then, when the current column of functions is finished with, because control has been returned to the main program, that column can be removed from memory, and the next required column brought in to the same memory space. The expense is the cost (and delay) of writing the current column of functions back to disk (or at least, all its global variables), and reading in the next overlay.

What I have termed 'columns of functions' would normally be called 'overlays' because they all overlay the same memory area.

The procedure for generating overlays is clear, though the linker will have to be guided as to which functions to place in which overlay to a greater or lesser extent. A root overlay is created containing the main program, the run time library and any frequently called user functions. There will also be an overlay manager in there to orchestrate the loading of overlays when required. The addresses of everything in the root overlay must be recorded for subsequent use by the linker when an overlay wants to call a function it contains.

For each overlay, or column of functions, in turn, the relevant functions are copied in and linked, and all the addresses fixed up, using the addresses in the root overlay as necessary. The overlay is then saved to disk for subsequent use.

And that is it. Instead of one file holding the memory image, you end up with three, or more. But after that, your program should run in less memory space.

A similar effect these days is achieved by all the executable images that make a Windows application program. Clicking on a button, or selecting a menu option often causes a fresh window to be opened in which a new executable image performs the requested function. That function completed, the window is closed and the memory space returned to the system. The code could all have been linked into a single, very large, executable image, but I suspect no-one would wait whilst a word processor bound as a single 500 MByte file was loaded into memory.

Something the Linker Does Not Know

Finally, there is one aspect of actually writing a linker that seems counter-intuitive. A linker does not actually need to know the name of the machine for which it is generating code.

A compiler needs to know this so that it can produce the correct assembly language: code for an 8080 will not run on a Pentium. The assembler also needs to know this: even if there are similar mnemonics on the two machines, the machine codes are probably different.

But the linker does not need to know the type of the target machine as all it does is copy byte streams, note the addresses of symbol definitions and insert the corresponding addresses into the instructions.

What the linker does need to know is the size of those addresses and the byte ordering to use (usually big-endian for Motorola, little-endian for Intel). This does not work out at very many combinations, even if you allow for 48 and 64 bit address spaces, so it is perfectly possible to write a single linker which will link for any processor, and then insert the right sort of address fields based on control information read from the relocatable binary files.

So there you have it: a quick exposition on what a linker does and why, together with some brief notes on what it is that a compiler, or assembler, hands across to the linker in the relocatable binary files.

Notes:

More fields may be available via dynamicdata ..