Today's Schedule
- Self-introduction
- About this course
- Difficulty of processing input
- Compiler structure
- Lexical analysis and parsing
- Example of an automaton
- Example of a formal language grammar
The Importance of Compilers
Blog by
Steve Yegge
Executive Summary:
- If you don't know how compilers work, then you don't know how computers
- If you're not 100% sure whether you know how compilers work, then you
don't know how they work.
Course Contents
Theory |
Compilers |
Other applications |
Front end |
language theory, automata |
lexical analysis, parsing |
regular expressions, text/data formats |
Back end |
optimization, code generation |
- Concentrate on input, not output
- Use various representations from theory to application
Example of Difference between Input and Output
(Computer Practice I, problems 04A1 and 04C1)
Character itself
(internal representation) |
Escaping in HTML/XML
(external representation) |
' |
' |
" |
" |
< |
< |
> |
> |
& |
& |
Which direction is more difficult?
Difficulties for Input
- Not structured (just a sequence of bytes/characters)
- Anything goes (including errors)
- Deciding whether some input is correct or not is a model for computation
in general
- Deciding whether input is correct is equivalent to recognition
The Function of a Compiler
Bridge between software and hardware
- Input: Program that can be understood by humans
- Language: High-level program language
- Medium: source (file/program)
- Output: Program that can be understood by a machine
- Language: assembly language, machine language
- Medium: object code、machine code
Example Compiler Input/Output
Input fragment:
sum += price * 25;
Output (assembly language):
LOAD R1, price ; load from price into R1 (register 1)
CONST R2, 25 ; put constant 25 into R2 (register 2)
MUL R1, R1, R2 ; put multiple of R1 and R2 into R1
LOAD R2, sum ; load from sum into R2
ADD R2, R1, R2 ; put the sum of R1 and R2 into R2
STORE sum, R2 ; store the contents of R2 into sum
Logical Structure of a Compiler
- [preprocessor]
- Lexical analysis
- Parsing (syntax analysis)
- Semantic analysis
- Optimization (or 5)
- Code generation (or 4)
- [assembler]
- [linker, loader]
Compiler Types and Related Software
- One pass compiler
- X-pass compiler (x between 1 and 70 (IBM's PL/1 compiler in the
- Cross-compiler (e.g. compiling on PC for smartphone)
- Dynamic/just-in-time (JIT) compiler
- Preprocessor (runs before the compiler)
- Interpreter (e.g. for Ruby)
Example of Lexical Analysis
Fragment of input program (sequence of characters):
s |
u |
m |
+ |
= |
p |
r |
i |
c |
e |
* |
2 |
5 |
; |
\n |
Output (sequence of tokens):
id("sum"), plusequal, id("price"), asterisk, int(25), semicolon
Overview of Lexical Analysis
- Convert a sequence of characters to a sequence of tokens
- Several characters become one token
- Similar to finding words in a natural language text
- Tokens can have attributes
Examples: id (identifier, with name)、int (integer, with value)
- Many token names are derived from the symbol shape
(example: * is asterisk, not mult, because it can also be
used for pointers)
- White space is used during lexical analysis, but then discarded
Example of Parsing
Input (sequence of tokens):
id("sum"), plusequal, id("price"), asterisk, int(25), semicolon
Output (syntax tree):
Details of Syntax Trees
- Tokens become nodes in the syntax tree
- For expressions:
- Operators become parents (and represent the operation)
- Operands (including subexpressions) become children
- The structure of the tree shows priority and associativity of
(operations at the bottom are evaluated first)
- Abstract constructs (e.g. statement,...) are added as parents
- Parentheses and other separators (
, ;
,...) are
used during parsing, but then discarded
More Examples
price = pretax / 100 * (108 - discount);
if (a > 5)
b = 15;
Example of an Automaton
Very simple automatic vending machine:
- Input: 50 Yen coins
- Output: Water bottle (price: 150 Yen)
State transition diagram:
Language and Grammar
- Language is defined on two layers:
- Structure (syntax)
- Meaning (semantics)
- Grammar defines (restricts) the structure of a language
- Example (simple imperative sentences):
eat bread、read books、play music
- Grammar for imperative sentences:
ImperativeSentence → Verb Noun
- The grammar allows sentences such as
"read bread" or "play books",
but semantically, they are problematic
Grammar of Imperative Sentences
ImperativeSentence → Verb Noun
Verb → eat
Verb → read
Verb → play
Noun → bread
Noun → music
Noun → books
Problem: For the one-line C program fragment below, based on the examples
given in this lecture, write down:
- the result of lexical analysis
- the result of parsing
- the output of the compiler (in assembly language; comments are not
needed; use
for substraction, and DIV
grade = math + english/2 - absent*10;
