A compiler is a computer program that translates source codes of a particular programming language into a form that can be executed directly by a computer (microprocessor or microcontroller). The process of translation is also known as compilation or transformation. The opposite, i.e. the retranslation of machine language into the source code of a particular programming language, is called decompilation and corresponding programs are called decompilers.
A compiler performs the following operations: lexical analysis, pre-processing, parsing, semantic analysis, and optimized code generation. The compilation is often followed by a link editing step, to generate an executable file. When the compiled program (object code) is run on a computer whose processor or operating system is different from that of the compiler, it is called cross-compilation.
In the analysis phase, the code is analyzed, structured and checked for errors. Languages such as modern C++ do not allow syntax analysis to be divided into lexical analysis, syntactic analysis, and semantic analysis due to ambiguities in their grammar. Their compilers are correspondingly complex.
---
- Syntax checking: It is checked whether the source code is a valid program, i.e. whether it corresponds to the syntax of the source language. Any errors detected are logged. The result is an intermediate representation of the source code.
- Analysis and optimization: The intermediate representation is analyzed and optimized. This step varies greatly in scope depending on the compiler and user preference. It ranges from simpler efficiency optimizations to program analysis.
- Code generation: The optimized intermediate display is translated into corresponding commands of the target language. Further, target language-specific optimizations can be made. Modern compilers (mostly) no longer perform code generation themselves.
In the frontend, the code is analyzed, structured and checked for errors. It itself, in turn, is divided into phases. Modern languages such as C++ do not allow syntax analysis to be divided into lexical analysis, syntactic analysis, and semantic analysis due to ambiguities in their grammar. Their compilers are correspondingly complex.
The synthesis phase generates the program code of the target language from the attributed syntax tree created by the frontend. Many modern compilers generate an intermediate code from the syntax tree, which can already be relatively close to the machine, and carry out program optimizations on this intermediate code, for example. This is especially useful for compilers that support multiple source languages or different target platforms. Here, the intermediate code can also be an exchange format.
The intermediate code is the basis of many program optimizations. During code generation, the program code of the target language is generated either directly from the syntax tree or from the intermediate code. If the target language is a machine language, the result can be directly an executable program or a so-called object file, which leads to a library or an executable program by linking to the runtime library and possibly other object files. This is all done by the code generator, which is part of the compiler system, sometimes as a program part of the compiler, sometimes as a standalone module.
Some compilers translate a source language into a virtual machine language (called intermediate language), that is to say into a code (usually binary) executed by a virtual machine: a program emulating the main features of a computer. Such languages are called semi-compiled. Porting a program only requires porting the virtual machine, which will be either an interpreter or a second translator (for multi-target compilers). Thus, compilers translate Pascal into P-Code or more recently Java code into Java bytecode (object code).