Code Generation
(コード生成)
13th lecture, July 13, 2018
Language Theory and Compilers
http://www.sw.it.aoyama.ac.jp/2018/Compiler/lecture13.html
Martin J. Dürst
© 2005-18 Martin
J. Dürst 青山学院大学
Today's Schedule
- Schedule from now on
- Summary and homework from previous lecture
- Code generation
- Overview
- Very simple assembly language
- Restricted C language
- Code generation for
if
,
if
/else
, and while
statements
- Code generation for conditions and logical or
- Code generation for function calls
Summary of Previous Lecture
- Turing machine: proposed by Alan Turing in 1936
- Uses a tape of infinite length
- Can be used to recognize phrase structure languages
- Universal Turing machines are a model of computers
- Many different kinds of extensions cannot make the Turing machine more
powerful
- There are many other very simple mechanisms that are
Turing-complete
Example Solution for bison
Homework
(paper only)
Remaining Schedule
- July 13: Code generation
- July 20: Optimization
- July 24 (Friday lectures on Tuesday): Executing
environment: virtual machines, garbage collection,...
- July 27, 11:10-12:35: Term final exam
(blue style
switches added to all past exams)
Compilation Stages
- Lexical analysis
- Parsing (syntax analysis)
- Semantic analysis
- Code generation (or 5)
- Optimization (or 4)
Relationship between Code Generation and Optimization
Many variations are possible:
- Optimization on abstract syntax tree, then code generation
- Analysis of generated code, then optimization
- In practice, mixture of both orders
Methods of Code Generation
- Do not create a syntax tree, generate code directly for each rewriting
rule of the grammar
(Examples: stack machine, conditional statements)
- Traverse the syntax tree and generate code for each node
- Compare subtrees of the syntax tree with tree patterns and
generate code for matching patterns
Difficulty of Code Generation
- Available instructions differ for each machine
- The number of instructions is large (>100), and it is difficult to
select the best ones
Main Machine Types
- Stack machine:
All operations are executed on the stack
(many virtual machines, e.g. Java JVM, Ruby YARV)
- RISC (Reduced Instruction Set Computer):
All operations work on registers, load/store are very simple
(example: ARM processors, used in most smartphones)
- CISC (Complex Instruction Set Computer):
Large number of complex instructions
(example: Intel x86, used in PC/Mac)
Example of Assembly Langugage
Input (C):
sum += price * 25;
Output (assembly):
LOAD R1, price ; R1 (register 1) ← price
CONST R2, 25 ; R2 (register 2) ← constant 25
MUL R1, R1, R2 ; R1 ← R1*R2
LOAD R2, sum ; R2 ← sum
ADD R2, R1, R2 ; R2 ← R1+R2
STORE sum, R2 ; sum ← R2
Assembly Language Details
- Minor abstraction from machine language to make it readable by humans
- Conversion to machine language by assembler
- Many different variants, but some commonalities
- One machine instruction per line
- Each line has four parts (columns):
- Label (not for every line; followed by '
:
')
- Instruction (arithmetic operation,...)
- Operands (registers, variable names, constants,...)
- Comment (after '
;
')
(extreme example of a RISC architecture; specific for this lecture)
instruction |
operands |
explanation |
LOAD |
R1 , a |
load value from memory location (variable) a into
register R1 |
STORE |
a , R1 |
store the value in R1 to the memory location (variable)
a |
CONST |
R1 , 5 |
set register R1 to the constant value 5 |
ADD |
R1 , R2 , R3 |
Add R2 and R3 and put the result into
R1 . The same register can be used two or three times.
SUB , MUL , and DIV are also
available. |
JUMP |
target |
Unconditional jump to instruction at label target |
JUMP< |
R1 , target |
Jump to target if R1 is <0. Otherwise,
continue to next instruction. JUMP>= ,
JUMP!= , ... also available. |
- Locations in memory are expressed using variable names (lower case)
- Registers are named
R1
, R2
,..., without any
limitation on their number
- The first operand is always the place where the result is placed
Code Generation from (Abstract) Syntax Trees
- Traverse the syntax tree in postorder
- For (almost) every node, generate an instruction
- Leave with constant →
CONST
instruction
- Leave with variable →
LOAD
instruction
- Internal node with arithmetic operator → arithmetic instruction
- For assignements, process the right hand side first, then convert the
assignement to
STORE
- After generating the instruction
- Replace the node with a label of the target register
- Remove the subtrees
- Keep track of which registers are used, and what they contain
Code Generation for if
Statements
- There are no machine instructions for control flow statements
- Convert conditions to conditional jump instructions
- Conditional jumps are often limited to:
- Using a flag from previous arithmetic operations, or
- Comparison with 0
- Usually, the jump is needed when the condition is false
→ Invert the condition
- Often, the target location of the jump is not yet known
→ Use a label in assembly language
Example of Code Generation for if
Statements
Original statement: if (a>10) b = 15;
Generated code:
LOAD R1, a
CONST R2, 10
SUB R3, R1, R2 ; R3 = a-10
JUMP<= R3, endif1 ; jump over 'if' part if a-10<=0
CONST R4, 15
STORE b, R4
endif1:
Restricted C Language
(intermediate language for humans; specific for this lecture)
- No
while
/do
-while
/for
loops, no switch
, no else
goto
can be used
- After
if
, only goto
can be used
- The condition in an
if
statement must be a comparison with
0
Example of Restricted C Language
Original statement:
if (a>10) b = 15;
Use comparision with 0 in condition:
if (a-10 > 0) b = 15;
Add a label:
if (a-10 > 0)
b = 15;
endif1:
Invert the condition and use goto
:
if (a-10<=0)
goto endif1;
b = 15;
endif1:
Example of Code Generation for if
・else
Original program:
if (a > b)
c = a;
else
c = b;
Rewriting to restricted C:
if (a-b <= 0)
goto else1;
c = a;
goto end1;
else1:
c = b;
end1:
Result of Code Generation for if
・else
LOAD R1, a
LOAD R2, b
SUB R1, R1, R2 ; a-b > 0
JUMP<= R1, else
LOAD R1, a
STORE c, R1
JUMP end
else1: LOAD R1, b
STORE c, R1
end1:
Code Generation for Logical Or
Original program:
if (a>10 || b < 3)
c = 5;
Rewriting to separate conditions:
if (a > 10)
c = 5;
else if (b < 3)
c = 5;
Rewriting to restricted C:
if (a-10 <= 0)
goto else1;
c = 5;
goto end1;
else1:
if (b-3 >= 0)
goto end1;
c = 5;
end1:
Result of Code Generation for Logical Or
LOAD R1, a
CONST R2, 10
SUB R1, R1, R2 ; a-10 > 0
JUMP<= R1, else1
CONST R1, 5
STORE c, R1
JUMP end1
else1: LOAD R1, b
CONST R2, 3
SUB R1, R1, R2 ; b-3 < 0
JUMP>= R1, end1
CONST R1, 5
STORE c, R1
end1:
Code Generation for while
Loop
Original program:
while (a < 20)
a += 3;
Rewriting to restricted C:
next: if (a-20 >= 0) goto break;
a += 3;
goto next;
break:
Result of Code Generation for while
Loop
next: LOAD R1, a
CONST R2, 20
SUB R1, R1, R2
JUMP>= R1, break
LOAD R1, a
CONST R2, 3
ADD R1, R1, R2
STORE a, R1
JUMP next
break:
Code Generation for Function Calls
- Special code needed on caller side and callee side, both for call and for
return
- Stack is used to store all necessary data
- The structure of the stack depends on the machine, the OS, and the
language
Contents of function call stack frame:
- Return address (address to return to when function execution ends)
- Arguments, (space for) return value
- Pointer to base of previous stack frame
- Space to save register values of calling function during execution of
called function
- Local variables
For an example of call stack structure (for Ruby), see RubyKaigi2018
talk
Also see example
images
Homework
Deadline: July 19, 2017 (Thursday), 19:00
Where to submit: Box in front of room O-529 (building O, 5th floor)
Format: A4 single page (using both sides is okay; NO cover page, staple in
top left corner if more than one page is necessary), easily readable
handwriting (NO printouts), name (kanji and kana) and student number at the top
right
Problem 1: Code generation for logical AND: Convert the C fragment
if (c<4 && f>=12) a=d;
to "Restricted C Language" and "Very Simple Assembly Language".
Problem 2 (bonus problem): Code generation for for
statement:
Convert the C fragment
for (i=0; i<20; i++) x*=y;
to "Restricted C Language" and "Very Simple Assembly Language".
Hint (for both problems): Try to remove the construct in question
(&&
or for
) by rewriting the C program
(Example: convert for
to while
)
Additional homework: Bring your notebook computer with you next time
Glossary
- Turing-complete
- チューリング完全
- stack machine
- スタック・マシーン
- conditional statement
- 条件文
- assembly language
- アセンブリ言語
- assembler
- アセンブラ
- (machine) instruction
- (機械の) 命令
- very simple assembly language
- 超単純アセンブリ言語
- conditional jump instruction
- 条件付きジャンプ命令
- restricted C language
- 制限された C 言語
- stack frame
- 関数フレーム