Driving Compilers

By Fabien Sanglard
May 3rd, 2023

Mistake - Suggestion
Feedback

The Linker (4/5)


driver
cpp cc ld* loader

The goal of the linker is to merge all relocatable sections together and create something the OS loader can load for execution. Since we are going to talk about it a lot on this page, let's clarify what relocation means, by quoting elf(5).

Relocation is the process of connecting symbolic references with symbolic definitions. 
Relocatable files must have information that describes how to modify their section
contents, thus allowing executable and shared object files to hold the right
information for a processes' program image. Relocation entries are these data.

                                                                           - elf(5)

The linker starts by picking sections in the relocatable(s) generated by the compiler and merges them together. Along the way, it patches in missing symbols from static libraries and emits relocation information for symbols imported from dynamic libraries.

As the drawing above shows, a static library .a is nothing else but a collection of relocatable .o. It is built using ar (for archiver) command.
$ clang -c x.c y.c
$ ar -rv foolib.a x.o y.o

In the good old days you needed to run ranlib on it in order to build an index which speeds up the linking process. Nowadays the default behavior of ar was changed to build this index by default.

Again, this article is only a high-level overview. If you want to deepen your knowledge of linkers, an excellent book on the topic is Linkers and Loaders by John R. Levine.

Output format

On Linux the output format is an ELF file (the same as the input). However using readelf we can see that whereas compiler outputs only featured sections, linker outputs also feature segments. Segments are used to point and group sections together. These two views are called Linking View (sections) and Execution View (segments).

Let's compile hello.c and peek inside a.out.

// hello.c

#include <stdio.h>

int main() {
   printf("Hello, World!");
   return 0;
}

Flag -l in readelf requests to show the segment (a.k.a "program headers") instead of the sections.

$ clang -v hello.c
clang -cc1 -o /tmp/hello-9c2163.o hello.c
/usr/bin/ld -o a.out  /tmp/hello-9c2163.o /lib/crti.o -L/lib -lc -lgcc 
$ file a.out 
a.out: ELF 64-bit LSB pie executable, ARM aarch64, version 1 (SYSV), interpreter /lib/ld-linux-aarch64.so.1
$ readelf -l -W a.out

Elf file type is DYN (Position-Independent Executable file)
Entry point 0x8c0
There are 9 program headers, starting at offset 64

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  PHDR           0x000040 0x0000000000000040 0x0000000000000040 0x0001f8 0x0001f8 R   0x8
  INTERP         0x000238 0x0000000000000238 0x0000000000000238 0x00001b 0x00001b R   0x1
      [Requesting program interpreter: /lib/ld-linux-aarch64.so.1]
  LOAD           0x000000 0x0000000000000000 0x0000000000000000 0x0008ac 0x0008ac R E 0x10000
  LOAD           0x000dc8 0x0000000000010dc8 0x0000000000010dc8 0x000270 0x000278 RW  0x10000
  DYNAMIC        0x000dd8 0x0000000000010dd8 0x0000000000010dd8 0x0001e0 0x0001e0 RW  0x8
  NOTE           0x000254 0x0000000000000254 0x0000000000000254 0x000044 0x000044 R   0x4
  GNU_EH_FRAME   0x0007b0 0x00000000000007b0 0x00000000000007b0 0x00003c 0x00003c R   0x4
  GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0x10
  GNU_RELRO      0x000dc8 0x0000000000010dc8 0x0000000000010dc8 0x000238 0x000238 R   0x1

 Section to Segment mapping:
  Segment Sections...
   00     
   01   .interp 
   02   .interp .gnu.hash .dynsym .dynstr rela.dyn .rela.plt .init .plt .text .fini .rodata .eh_frame 
   03   .init_array .fini_array .dynamic .got .got.plt .data .bss 
   04   .dynamic 
   05   .note.gnu.build-id .note.ABI-tag 
   06   .eh_frame_hdr 
   07     
   08   .init_array .fini_array .dynamic .got 

The program headers instruct where group of sections are in the ELF file (PhysAddr) and where they should be mapped in virtual memory (VirtAddr) by the loader.

Linker(s)

As the verbose trace above shows, clang driver invoked itself to compile the source file and then called /usr/bin/ld to link an executable.

There are many linkers available on Linux. The first one available on the platform was GNU's, commonly called ld. Later came gold which was built to improve speed. LLVM also released their own linker called lld. The path /usr/bin/ld is not enough to tell which one it is. But we can dig a little bit.

$ ll /usr/bin/ld
lrwxrwxrwx 1 root root 20 Nov  2 13:58 /usr/bin/ld -> aarch64-linux-gnu-ld*
$ /usb/bin/ld --version
GNU ld (GNU Binutils for Ubuntu) 2.38

The linker bottleneck

The linking stage is a bottleneck in the compilation pipeline. Contrary to the compiler which can be run in parallel on each translation unit and whose outputs can be cached between runs, the linker must wait until all object files are ready to start linking.

As a result, significant optimization have targeted the linker. Efforts such as gold, Apple's WC2022 multi-threaded work, or mold which claims a 30x speed increase are among many.

Incremental linking

The most important optimization is called "Incremental Linking". It consists in re-using work done during the previous linking operation. Few linkers can do it. GNU's ld, LLVM's lld, and Apple's ld64 can't do it.

gold can do it, but only if you pass a special linker flag, which typical build systems don't. Microsoft's LD.EXE can also do it when given a special flag /INCREMENTAL.

How the linker find resources

Alike the Preprocessor, the linker does not ship with a hard-coded list of location and libraries path to lookup. These are supplied, respectively via -L and -l, by the driver.

$ clang -v hello.c
clang -cc1 -o /tmp/hello-9a2af8.o  hello.c
ld  -o a.out \  
-L/usr/lib/gcc/aarch64-linux-gnu/11 \
-L/lib/aarch64-linux-gnu \
-L/usr/lib/aarch64-linux-gnu \
-L/usr/lib/llvm-14/lib \
-L/lib \
-L/usr/lib \
\  
-lgcc \
-lgcc_s \ 
-lc \
\  
/usr/lib/gcc/aarch64-linux-gnu/11/crtendS.o 
/lib/aarch64-linux-gnu/crtn.o 
/tmp/hello-9a2af8.o  
	

In the trace above, the linker is provided with six folders in red, three dynamic libraries in blue, and must link together the objects passed extra parameters in green.

The name of a library is prefixed with lib and suffixed with the dynamic library extension (on Linux .so) when looked up on the filesystem. Therefore you won't find a file at /lib/aarch64-linux-gnu/c but you will find /lib/aarch64-linux-gnu/libc.so.
If you peek inside libc.so, you will find out that it is not an ELF file. It is an ASCII text file.
$ file /lib/aarch64-linux-gnu/libc.so
/lib/aarch64-linux-gnu/libc.so: ASCII text

This text file is a linker script which points to /lib/aarch64-linux-gnu/libc.so.6.

Linking libraries

There are two types of library linking, named static and dynamic. As we saw earlier, a static library is nothing but a collection of object files packaged in a .a archive. These objects are included in the final binary.

Linking against a dynamic library is different. The linker looks up the dynamic library symbols but does not pull them into the final binary. Instead it emits a special section dynsym which lists the name of symbols to be found at runtime, along with a list of dynamic library names where they may be in section .dynamic. We can see the dynamic library an executable needs with either readelf or ldd

$ clang -o hello hello.c
$ readelf -d hello| grep NEEDED
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
$ ldd hello
	linux-vdso.so.1 (0x0000ffff85df9000)
	libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000ffff85bd0000)
	/lib/ld-linux-aarch64.so.1 (0x0000ffff85dc0000)	

Notice the output of ldd resolves where the library are on the system. It also includes the interpreter path, we will get to this in the next chapter.

Using readelf, we can see how the imported symbols are suffixed with the name of the dynamic library. The matching library also feature the same suffix in its exported symbols. If the dynamic library has a version, this is also where it is featured (e.g: GLIBC_2.17 here).

$ readelf -s hello

Symbol table '.dynsym' contains 10 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 00000000000005b8     0 SECTION LOCAL  DEFAULT   11 .init
     2: 0000000000011028     0 SECTION LOCAL  DEFAULT   23 .data
     3: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND _[...]@GLIBC_2.34 (2)
     4: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_deregisterT[...]
     5: 0000000000000000     0 FUNC    WEAK   DEFAULT  UND _[...]@GLIBC_2.17 (3)
     6: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__
     7: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND abort@GLIBC_2.17 (3)
     8: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_registerTMC[...]
     9: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND printf@GLIBC_2.17 (3)
 
$ readelf -s /lib/aarch64-linux-gnu/libc.so.6 | grep printf

Symbol table '.symtab' contains 90 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
    60: 000000000006cfe0   168 FUNC    GLOBAL DEFAULT   12 swprintf@@GLIBC_2.17
   259: 000000000006d090    56 FUNC    GLOBAL DEFAULT   12 vwprintf@@GLIBC_2.17
   437: 0000000000072184    40 FUNC    WEAK   DEFAULT   12 vasprintf@@GLIBC_2.17
   578: 0000000000050cb0   168 FUNC    GLOBAL DEFAULT   12 dprintf@@GLIBC_2.17
   761: 0000000000050920   168 FUNC    GLOBAL DEFAULT   12 fprintf@@GLIBC_2.17
  1137: 0000000000050d60    40 FUNC    WEAK   DEFAULT   12 vfwprintf@@GLIBC_2.17
  1188: 0000000000050c00   168 FUNC    WEAK   DEFAULT   12 asprintf@@GLIBC_2.17
  1302: 0000000000072530    40 FUNC    WEAK   DEFAULT   12 vsnprintf@@GLIBC_2.17
  1401: 0000000000072350    40 FUNC    WEAK   DEFAULT   12 vdprintf@@GLIBC_2.17
  1561: 000000000004be40    40 FUNC    GLOBAL DEFAULT   12 vfprintf@@GLIBC_2.17
  1911: 0000000000050b40   180 FUNC    GLOBAL DEFAULT   12 sprintf@@GLIBC_2.17
  1930: 000000000006cf30   168 FUNC    WEAK   DEFAULT   12 fwprintf@@GLIBC_2.17
  2123: 0000000000050a90   168 FUNC    WEAK   DEFAULT   12 snprintf@@GLIBC_2.17
  2146: 000000000006d4c0    40 FUNC    WEAK   DEFAULT   12 vswprintf@@GLIBC_2.17
  2229: 000000000004be70    56 FUNC    GLOBAL DEFAULT   12 vprintf@@GLIBC_2.17
  2315: 000000000006d0d0   188 FUNC    GLOBAL DEFAULT   12 wprintf@@GLIBC_2.17
  2837: 000000000006ba70   204 FUNC    WEAK   DEFAULT   12 vsprintf@@GLIBC_2.17
  2841: 00000000000509d0   188 FUNC    GLOBAL DEFAULT   12 printf@@GLIBC_2.17

Notice the WEAK binding of some symbols which we discussed earlier.

Library order in static linking

While we are on the topic of linker symbol resolution, you should *really* take a few minutes to read Eli Bendersky's explanation of linking order in static libraries. In fact, his whole website is a gem which partially inspired this series.

_start

What happens if the function where the program starts, main, is mistakenly named maib.

// hello.c

#include <stdio.h>

int maib() {
  printf("Hello, World!");
  return 0;
}

Let's try to compile it.

$ clang mainb.c
/usr/bin/ld: /lib/aarch64-linux-gnu/Scrt1.o: in function `_start':
(.text+0x1c): undefined reference to `main'
/usr/bin/ld: (.text+0x20): undefined reference to `main'
clang: error: linker command failed with exit code 1 (use -v to see invocation)

The linking fails because a mysterious object Scrt1.o features a function _start which calls main. That's because the execution of a program does not really begin at main. There are many things to set up before a program can run, among other things the stack must be initialized and the program arguments prepared.

In our example the piece of assembly in charge of initialization is called Scrt1.s. Only when everything is ready, the function __start calls main, Scrt1.s can also sometimes be found named ctr0. In both cases, the name is derived from C RunTime.

Likewise, a program execution does not end after main returns. It is easy to verify using atexit function which is executed by the C runtime after main returns.

// atexit.c

#include <stdio.h>
#include <stdlib.h>

void bye(void) {
  puts("Goodbye, cruel world....");
}

int main(void) { 
  atexit(bye);
  puts("This is the last function call");
  return 0;
}

Let's see the outputs

$ clang atexit.c
$ ./a.out
This is the last function call
Goodbye, cruel world....

If you feel like going even deeper on the topic of C runtime, make sure to read the Tutorial on Creating Teensy ELF Executables.

Common error, when mixing static and dynamic libraries

Let's say we have a project with three source files. One of them hold a "singleton" char variable named c.

 // main.c
	
#include "stdio.h"	



char getChar();



void setChar(char ch);


int main() {
  setChar('a');
  putc(getChar())
}
// static.c 



char c = 'b';

char getChar() {
  return c;
}








// dynamic.c
	


extern char c;





void setChar(char ch) {
  c = ch;
}




We build the project as an object, a static library, and a dynamic library.

$ clang -o static.o -c                        static.c
$ ar rcs libmyStatic.a static.o
$ clang -o libmyShared.so -shared  -lmyStatic dynamic.c
$ clang -o main -lmyShared -lmyStatic         main.c

The dependency graph looks as follows.

What is the program going to display when it runs? Will it be a, b, or 42?

$ ./main.c
b	

main calls setChar to set the value of c to 'a' and then prints this very variable it just set. The output expected is therefore 'a'. But when we run, we see 'b' being printed.

This happened because the static library was linked twice. There are two copies of the variables c in the final program. One that is read by getChar() and another one which is written by setChar. As much as possible if you are designing a complex project, try to stick to static libraries.

Common error, the dreaded "duplicate symbol"

Some error originate at the compiler level but surface at the linker level. This is the case for the beginners' dreaded "duplicate symbol" (a.k.a LNK4002 in the Windows/Visual Studio world). Here is a mini-project to show the problem.

// counter.h

#pragma once

int counter = 0;
int incCounter();
// counter.c


#include "counter.h"

void incCounter() {
  counter++;  
}

// main.c

#include <stdio.h>
#include "counter.h"  

int main() {
  incCounter();
  printf("%d\n", counter);
}

This is a simple program with a main part and a counter part. It fails to compile.

$ clang counter.c main.c
1 warning generated.
duplicate symbol '_counter' in:
    /var/folders/sp/tmp/T/counter-c84ff0.o
    /var/folders/sp/tmp/T/cmain-3e41f8.o
ld: 1 duplicate symbol for architecture x86_64

Let's inspect what is going on. First at the translation unit level and then at the symbol level.

$ clang -E -o counter.tu counter.c
$ cat counter.tu

int counter = 0;
int incCounter();

int incCounter() {
  counter++;
}
$ clang -E -o main.tu main.c
$ cat main.tu

int counter = 0;
int incCounter();

int main() {
  printf("%d\n", counter);
}

Let's look at the symbols now.

$ clang -c -o counter.o counter.c
$ nm counter.o
0000000000000000 B counter
0000000000000000 T incCounter

$ clang -c -o main.o main.c
$ nm main.o
0000000000000000 B counter
0000000000000000 T main
                 U printf

Due to the siloed nature of the translation unit, the compiler will happily produce object files, only for the linker to scream bloody murder when it finds duplicate symbols (like in our example counter) without a way to know which one to use.

Avoid these kinds of errors by never defining anything in a header. Headers should only contain declarations, and only expose the strict minimum. If you need to share a storage symbol, use extern.

Linker trust

There is a certain level of trust when the linker combines object files. For example there is no verification that imported and exported symbol types match.

// trick.c

#include <stdio.h>

extern short i;

int main() {
  printf("i=%d\n", i);
  return 0;
}

 // i.c



const char* i = "a string!";






The defined type and the declared type of i did not match but the linker happily combined the object files.

$ clang trick.c i.c
$ ./a.out
2034

Section pruning

In the compiler page, the "Section Management" part mentioned how to create one section per symbol. This is usually used in conjunction with linker flags to bring in the final product only what is needed. This is achievable by providing the compiler driver with flags for the linker.

$ clang -v -ffunction-sections -fdata-sections -Wl,--gc-sections -Wl,--as-needed main
clang -cc1 -o /tmp/main-476f21.o -x c main.c
ld  --gc-sections --as-needed /tmp/main-476f21.o

The executable size reduction will vary depending on the project and translation units structures.

$ clang -v -ffunction-sections -fdata-sections -Wl,--gc-sections -Wl,--as-needed main
$ ll a.out
-rwxrwxr-x 1 leaf leaf 8840 Apr  4 22:53 a.out*
$ clang  main.c
$ ll a.out
-rwxrwxr-x 1 leaf leaf 9064 Apr  4 22:56 a.out*
Libc implementations may look like they use one source file per function in order to reduce code size (bionic, GNU libc). However, this is likely to avoid inlining and allow symbol pre-emption.

Linker script

The output of the linker is configured by a linker script. It is a powerful mechanism allowing among other things to tell where each section should go in the output file and where they should be mapped in memory by the loader.

Linkers such as ld have default script (visible with the command ld --verbose) and users don't have to worry about it. Using custom scripts is mandatory for toolchains targeting machines with exotic memory mapping.

Let's take the example of ccps, a toolchain to compile for Capcom CPS-1 (arcade machines of the early 90s). The (partial) memory mapping expected by the hardware is as follows.

Address Purpose
0x000000-0x3FFFFF ROM
0x900000-0x92FFFF GFXRAM
0xFF0000-0xFFFFFF RAM

ccps achieves this mapping with the following linker script.

// cps1 Linker Script

OUTPUT_FORMAT("binary")
OUTPUT_ARCH(m68k)
ENTRY(_start)

MEMORY
{
  rom (rx)    : ORIGIN = 0x000000, LENGTH = 0x200000
  gfx_ram(rw) : ORIGIN = 0x900000, LENGTH = 0x2FFFF
  ram(rw)    : ORIGIN = 0xFF0000, LENGTH = 0xFFFF
}

First three memory regions are created, with offset and size. Then sections are mapped to memory regions.

	
SECTIONS {
  .text : {
    *(.text)
    *(.text.*)
    . = ALIGN(4);
  } > rom

  .rodata : {
    *(.rodata)
    *(.rodata.*)
    . = ALIGN(4);
  } > rom

  .gfx_data : {
  } > gfx_ram

  .bss : {
    __bss_start = .;
    *(.bss)
    *(.bss.*)
    _end = .;
    . = ALIGN(4);
  } > ram

  .data : {
    *(.data)
    *(.data.*)
    . = ALIGN(4);
  } > ram
}		
		
Ngdevkit toolchain targets the Neo-Geo arcade machine. It is much more elaborated.

Next


Loader (5/5)


*