Code generation
(コード生成)
13th lecture, July 1, 2016
Language Theory and Compilers
http://www.sw.it.aoyama.ac.jp/2016/Compiler/lecture13.html
Martin J. Dürst
© 2005-15 Martin
J. Dürst 青山学院大学
Today's Schedule
- Schedule from now on
- Summary and homework from previous lecture
- Code generation
- Overview
- Very simple assembly language
- Restricted C language
- Code generation for
if
,
if
/else
, andwhile
statements
- Code generation for conditions and logical or
- Code Generation for Function Calls
Summary of Previous Lecture
- Turing machines were proposed by Alan Turing in 1936
- They use a tape of infinite length
- They can be used to recognize phrase structure languages
- Universal Turing machines are a model of computers
- Many different kinds of extensions cannot make the Turing machine more
powerful
- There are many other very simple mechanisms that are
Turing-complete
Remaining Schedule
- July 1: Code generation
- July 8: Optimization
- July 15: Executing environment: virtual machines, garbage
collection,...
- [July 22: Tuesday lectures]
- July 25~August 2: Term final exams
Final Exam
- Past problems and example solutions (85'): 2015, 2014, 2013 (60'),
2012, 2011 (45'),
2010, 2009, 2008, 2007, 2006, 2005
- How to view example solutions:
- Solutions are only examples; other solutions may be possible; sometimes,
solutions are missing
- Best way to use:
- Simulate actual exam by solving a full set of problems
- Check solutions
- Find weak areas and review content
- Repeat
Compilation Stages
- Lexical analysis
- Parsing (syntax analysis)
- Semantic analysis
- Optimization (or 5)
- Code generation (or 4)
Relationship between Code Generation and Optimization
Many variations are possible:
- Optimization on abstract syntax tree, then code generation
- Analysis of generated code, then optimization
- In practice, mixture of both orders
Methods of Code Generation
- Do not create a syntax tree, generate code directly for each rewriting
rule of the grammar
(Examples: Stack machine, conditional statements)
- Traverse the syntax tree and generate code for each node
- Compare subtrees of the syntax tree with tree patterns and generate code
for matching patterns
Difficulty of Code Generation
- Available instructions differ for each machine
- The number of instructions is large (>100), and it is difficult to
select the best one
Main Machine Types
- Stack machine:
All operations are executed on the stack
(many virtual machines, e.g. Java JVM, Ruby YARV)
- RISC (Reduced Instruction Set Computer):
All operations work on registers, load/store are very simple
(example: ARM processors, used in most smartphones
- CISC (Complex Instruction Set Computer):
Large number of complex instructions
(example: Intel x86, used in PC/Mac)
Example of Assembly Langugage
Input (C):
sum += price * 25;
Output (assembly):
LOAD R1, price ; R1 (register 1) ← price
CONST R2, 25 ; R2 (register 2) ← constant 25
MUL R1, R1, R2 ; R1 ← R1*R2
LOAD R2, sum ; R2 ← sum
ADD R2, R1, R2 ; R2 ← R1+R2
STORE sum, R2 ; sum ← R2
Assembly Language Details
- Minor abstraction from machine language to make it readable by humans
- Conversion to machine language by assembler
- Many different variants, but some commonalities
- One machine instruction per line
- Each line has four parts (columns):
- Label (not for every line, followed by '
:
')
- Instruction (arithmetic operation,...)
- Operands (registers, variable names, constants,...)
- Comment (after '
;
')
(extreme example of a RISC architecture)
instruction |
operands |
explanation |
LOAD |
R1, a |
load value from memory location (variable) a into register R1 |
STORE |
a, R1 |
store the value in R1 to the memory location (variable) a |
CONST |
R1, 5 |
set register R1 to the constant value 5 |
ADD |
R1, R2, R3 |
Add R2 and R3 and put the result into R1. The same register can be
used two or three times. SUB, MUL, and DIV are also available. |
JUMP |
target |
Unconditional jump to instruction with label target |
JUMP< |
R1, label |
Jump to target if R1 is smaller than 0. Otherwise, continue to next
instruction. JUMP>=, JUMP!=, and so on are also available. |
- Locations in memory are expressed using variable names (lower case)
- Registers are named R1, R2,..., without any limitation on their
number
- The first operand is always the place where the result is assigned
Code Generation from (Abstract) Syntax Trees
- Traverse the syntax tree in postorder
- Generate a
CONST
instruction for leaves with constants
- Generate a
LOAD
instruction for leaves with variables
- Generate an arithmetic instruction for internal nodes with arithmetic
operators
- After generating an instruction, replace the node with the number of the
target register
- After converting an internal node, remove the subtrees
- For assignements, process the right hand side first, then convert the
assignement to
STORE
- Keep track of which registers are used
Code Generation for if
Statements
- Convert conditions to conditional jump instructions
- Conditional jumps are often limited to using a flag from previous
arithmetic operations or to comparison with 0
- Usually, the jump is needed when the condition is false
→ Invert the condition
- Often, the target location of the jump is not yet known
→ Use a label in assembly language
Example of Code Generation for if
Statements
Original statement: if (a>10) b = 15;
Generated code:
LOAD R1, a
CONST R2, 10
SUB R3, R1, R2 ; R3 = a-10
JUMP<= R3, endif ; jump over 'if' part if a-10<=0
CONST R4, 15
STORE b, R4
endif:
Restricted C Language
An intermediate language for humans)
- No
while
loops or for
loops
goto
can be used
- After
if
, only goto
can be used
- The condition in an
if
statement must be a comparison with
0
Example of Restricted C Language
Original statement:
if (a>10) b = 15;
Use comparision with 0 in condition:
if (a-10 > 0) b = 15;
Add a label:
if (a-10 > 0)
b = 15;
endif:
Invert the condition and use goto
:
if (a-10<=0)
goto endif;
b = 15;
endif:
Example of Code Generation for if
・else
Original program:
if (a > b)
c = a;
else
c = b;
Rewriting to restricted C:
if (a-b <= 0)
goto else;
c = a;
goto end;
else:
c = b;
end:
Result of Code Generation for if
・else
LOAD R1, a
LOAD R2, b
SUB R1, R1, R2 ; a-b > 0
JUMP<= R1, else
LOAD R1, a
STORE c, R1
JUMP end
else: LOAD R1, b
STORE c, R1
end:
Code Generation for Logical Or
Original program:
if (a>10 || b < 3)
c = 5;
Rewriting to restricted C:
if (a>10)
c = 5;
else if (b < 3)
c = 5;
Result of Code Generation for Logical Or
LOAD R1, a
CONST R2, 10
SUB R1, R1, R2 ; a-10 > 0
JUMP<= R1, else
CONST R1, 5
STORE c, R1
JUMP end
else: LOAD R1, b
CONST R2, 3
SUB R1, R1, R2 ; b-3 < 0
JUMP>= R1, end
CONST R1, 5
STORE c, R1
end:
Code Generation for while
Loop
Original program:
while (a < 20)
a += 3;
Rewriting to restricted C:
next: if (a-20 >= 0) goto break;
a += 3;
goto next;
break:
Result of Code Generation for while
Loop
next: LOAD R1, a
CONST R2, 20
SUB R1, R1, R2
JUMP>= R1, break
LOAD R1, a
CONST R2, 3
ADD R1, R1, R2
STORE a, R1
JUMP next
break:
Code Generation for Function Calls
- Special code needed on caller side and callee side
- Stack is used to store all necessary data
- The structure of the stack depends on the machine, the OS, and the
language
Contents of function call stack frame:
- Return address (address to return to when function execution ends)
- Arguments, (space for) return value
- Pointer to base of previous stack frame
- Space to save register values of calling function during execution of
called function
- Local variables
Homework
Deadline: July 7, 2016 (Thursday), 19:00
Where to submit: Box in front of room O-529 (building O, 5th floor)
Format: A4 single page (using both sides is okay; NO cover page, staple in
top left corner if more than one page is necessary), easily readable
handwriting (NO printouts), name (kanji and kana) and student number at the top
right
Problem 1: Code generation for logical AND: Convert the C fragment
if (c<4 && f>=12) a=d;
to "Restricted C Language" and "Very Simple Assembly Language".
Problem 2 (bonus problem): Code generation for for
statement:
Convert the C fragment
for (i=0; i<20; i++) x*=y;
to "Restricted C Language" and "Very Simple Assembly Language".
Hint (for both problems): Try to remove the construct in question
(&&
or for
) by rewriting the C program
(Example: convert for
to while
)
Glossary
- Turing-complete
- チューリング完全
- stack machine
- スタック・マシーン
- conditional statement
- 条件文
- assembly language
- アセンブリ言語
- assembler
- アセンブラ
- (machine) instruction
- (機械の) 命令
- very simple assembly language
- 超単純アセンブリ言語
- conditional jump instruction
- 条件付きジャンプ命令
- restricted C language
- 制限された C 言語
- stack frame
- 関数フレーム