Code generation is one of the last steps of the compiler. This is where the compiler emits actual machine code for the IR that was previously created.
This is the eighth post in our Compiler series. Other posts:
- LLVM Everywhere
- Compilers 101 – Overview and Lexer
- Compilers 102 – Parser
- Compilers 103 – Semantic Analyzer
- Compilers 104 – IR Generation
- Compilers 105 – Back End Overview
- Compilers 106 – Optimizer
- Compilers 107 – Optimizer Loop Unrolling
This is the simple code we started with in the Compilers 101 blog post:
sum = 3.14 + 2 * 4 // calculation
It results in a constant value of 11. After the IR is generated and optimized, it can boil down to just a single line of machine code, which will vary by processor and architecture. Machine code is just binary and not readable, so below is what the Assembly code might look like.
This is the Assembly code for 32-bit ARM:
movs r0, #11
This is the Assembly code for ARM64:
movz w0, #11
x86 and x86-64 use this Assembly code:
movl $11, %eax
Obviously this is the tricky part of making a multi-platform compiler since Assembly code is different between processors and architectures.
Once you have machine code that the computer can run, the last step is to link all the pieces together so that you have an app that the OS can run. This is done by the Linker.