Error Processing, Intermediate
Representation, Semantic Analysis
(エラー処理、中間表現、意味解析)
11th lecture, June 16, 2017
Language Theory and Compilers
http://www.sw.it.aoyama.ac.jp/2014/Compiler/lecture11.html
Martin J. Dürst
© 2005-17 Martin
J. Dürst 青山学院大学
Today's Schedule
- Summary and leftovers of last lecture
- Additional information and hints for homework
- Repetition: Grammar patterns
- Error processing
- Intermediate Representations
- Semantic analysis
Leftovers from Last Week
Summary of Last Week
- There are many different orders of derivation, in particular leftmost
derivation and rightmost derivation
- Recursive descent parsing corresponds to leftmost derivation
- Bottom-up parsing (LALR parsing) uses the reverse order of rightmost
derivation
- LALR parsing operations: shift/reduce/goto/accept
- shift pushes a token and a state onto the stack
- reduce converts a number of tokens and nonterminals at the top of the
stack to a single nonterminal
- These operations can be checked in the
.output
file produced
by bison -v
and when debugging by setting #define YYDEBUG 1
- For ambiguous grammars,
bison
reports shift/reduce and
reduce/reduce conflicts
Hints for Homework
Deadline: June 29, 2017 (next Thursday), 19:00
Expand the simple calculator of calc.y to a small programming language (see
last week for details).
Now is your chance to ask questions!
Additional hints:
For the variables, use an arrary of appropriate length
Example of inputs and outputs: test.in;
test.check
Processing Syntax Errors
- Why is error processing difficult
- Requirements for error processing
- Techniques for error processing
Why is Error Processing Difficult
- To make programs shorter, 'unnecessary' symbols should be avoided
- For each correct program, there are many different erroneous programs
- It is difficult for programs to distinguish between errors easily made by
humans and errors rarely made by humans
- Parsing is based on language theory, but there is not much theory for
error processing
Requirements for Error Processing
- Output error messages that are easy to understand
- Find as many actual errors as possible
- Avoid secondary errors
- Do not slow down processing of correct programs
- Do not make the compiler much more complicated
Techniques for Error Processing
- Throw away tokens until finding a token that matches the grammar
(panic mode)
- Try to add or exchange a small number of tokens
- Add productions to the grammar that catch errors (error
productions)
- Search for the correct program closest to the input
Error Processing in bison
Compilation Stages
- Lexical analysis
- Parsing (syntax analysis)
- Semantic analysis
- Optimization (or 5)
- Code generation (or 4)
Intermediate Representation: Symbol Table
- Functionality provided:
- Search of symbols
- Registration and removal (for local variables) of symbols
- Management of data for each symbol
- Main points:
- Frequent use, large number of symbols → efficiency is important
- The same symbol may be used for different things in different
contexts → distinction by scope and type is important
Data Stored by Symbol Table
- Kind of symbol (variable, argument, function, type,...)
- Locations of declarations/definition/use
- Type for variables, functions,...
- Scope where a symbol is visible (e.g. global, file, function, compound
statement,...)
- For variables,...: Size (amount of memory needed)
- For functions, variables,...: (relative) address
Example of Scope
extern int a; // declaration only, global scope
static int b; // file scope
int f (int a) // function scope
{
int a; // function scope
static int b; // function scope, but persists across function calls
while (...) {
int a; // block scope
}
}
Intermediate Representation: Abstract Syntax Tree
How to construct an abstract syntax tree:
Create nodes of the syntax tree as attributes of the attributed grammar.
For example, rewrite this
exp: exp '+' term { $$ = $1 + $3; }
to this:
exp: exp '+' term
{ $$ = newnode(PLUS, $1, $3); }
(YYSTYPE
has to be changed)
Most parts of an abstract syntax tree are binary (two branches), but for
some constructs (e.g. arguments of a function), special treatment is
necessary.
(For very simple programming languages (e.g. Pascal) and simple
architectures (e.g. stack machine), it is possible to create code during
parsing and to avoid the creation of an abstract syntax tree.)
Semantic Analysis
- Mainly analysis and processing of type information:
- Check whether types match
- If necessary, add automatic type conversion (to syntax tree)
- For C and similar languages, relatively easy
- For object-oriented languages, has to consider inheritance,...
- Some languages (e.g. Haskell) use type inference
- Timing: When abstract syntax tree is constructed or just before or during
code generation
Type Equivalence
There are different ways to define type equivalence:
- Same name, same type (simple, but inconvenient for user)
- Same components, same type (complicated)
- For object-oriented languages, many choices exist
Example for C: type-equivalence.c (does
this program compile?)
Type Equivalence in Haskell
- Haskell: Functional programming language with strong theoretical
background
- Characteristics:
- No assignment (only initialization), no loops (only recursion)
- Lazy evaluation
- Type inference
- Example of type inference:
- Type of function
f
: (a, Char) ->
(a, [Char])
(function from a pair of a
and
Char
to a pair of a
and a list of
Char
s
- Type of function
g
: (Int, [b]) ->
Int
(function from a pair of Int
and a list
ofb
s to an Int
)
- What is the type of the function
h(x) := g(f(x))
?
Topic Next Week: Turing Machines
Glossary
- syntax error
- 構文エラー
- secondary error
- 二次エラー
- symbol table
- 名前表
- scope
- 有効範囲、スコープ
- compound statement
- 複文
- inheritance
- 継承
- type inference
- 型推論
- type equivalence
- 型の等価
- functional (programming) language
- 関数型 (プログラミング) 言語
- lazy evaluation
- 遅延評価