Error Processing, Intermediate
Representation, Semantic Analysis
(エラー処理、中間表現、意味解析)
11th lecture, June 28, 2019
Language Theory and Compilers
http://www.sw.it.aoyama.ac.jp/2019/Compiler/lecture11.html
Martin J. Dürst
© 2005-19 Martin
J. Dürst 青山学院大学
Today's Schedule
- Summary and leftovers of last lecture
- Additional information and hints for homework
- Error processing
- Intermediate Representations
- Semantic analysis
Leftovers from Last Lecture
Summary of Last Lecture
- There are many different orders of derivation, in particular leftmost
derivation and rightmost derivation
- Recursive descent parsing corresponds to leftmost derivation
- Bottom-up parsing (LALR parsing) uses the reverse order of rightmost
derivation
- LALR parsing operations: shift/reduce/goto/accept
- shift pushes a token and a state onto the stack
- reduce converts a number of tokens and nonterminals at the top of the
stack to a single nonterminal
- These operations can be checked in the
.output
file produced
by bison -v
and when debugging by setting #define YYDEBUG 1
- For ambiguous grammars,
bison
reports shift/reduce and
reduce/reduce conflicts
Hints for Homework
Deadline: July 4, 2018 (Thursday in one week), 19:00
Change the simple calculator of calc.y to a calculator that can handle
integers and rationals.
- Statements are separated with
;
.
- Rationals are expressed as
[numerator,
denominator]
.
- Inside
[]
, division is not allowed. Make sure this is
checked by the grammar.
- Nesting of
[]
is not allowed. Make sure this is checked by
the grammar.
- Calculations are exact, not using floating point numbers.
- Print out the result of each statement.
- Results are given as irreducible fractions, with a minus sign on the
numerator if applicable.
- Example result:
Result is -53/17
- Use the grammar to define priorities and associativities (do NOT use
%left
, %right
,...).
Additional hints:
- (also check hints in last lecture)
- How to deal with multiple types (integers, rationals):
- Make
YYSTYPE
a struct
that can represent
rationals and integers (in both .lex
and
.y
)
- Eliminate impossible calculations with the grammar (ex: division
inside rationals)
- Start from calc, but make sure you change the file names (incl.
makefile
)
- Example input: rational.in.txt; example
output: rational.check.txt
Now is your chance to ask questions!
Processing Syntax Errors
- Why is error processing difficult
- Requirements for error processing
- Techniques for error processing
Why is Error Processing Difficult
- To make programs shorter, 'unnecessary' symbols should be avoided
- For each correct program, there are many different erroneous programs
- It is difficult for programs to distinguish between errors easily made by
humans and errors rarely made by humans
- Parsing is based on language theory, but there is not much theory for
error processing
Requirements for Error Processing
- Output error messages that are easy to understand
- Find as many actual errors as possible
- Avoid secondary errors
- Do not slow down processing of correct programs
- Do not make the compiler much more complicated
Techniques for Error Processing
- Throw away tokens until finding a token that matches the grammar
(panic mode)
- Add productions to the grammar that catch errors (error
productions)
- Try to add or exchange a small number of tokens
- Search for a correct program closest to the input
Error Processing in bison
Compilation Stages
- Lexical analysis
- Parsing (syntax analysis)
- Semantic analysis
- Optimization (or 5)
- Code generation (or 4)
Intermediate Representation: Symbol Table
- Functionality provided:
- Search of symbols (identifiers)
- Registration and removal (for local variables) of symbols
- Management of data for each symbol
- Main points:
- Frequent use, large number of symbols
→ efficiency is important
- The same symbol may be used for different things in different
contexts
→ distinction by scope and type is important
Example of Scope
extern int a; // declaration only, global scope
static int b; // file scope
int f (int a) // f: file scope; a: function scope
{
int a; // function scope
static int b; // function scope, but persists across function calls
while (...) {
int a; // block scope
}
}
Data Stored by Symbol Table
- Kind of symbol (variable, argument, function, type,...)
- Locations of declarations/definition/use
- Type for variables, functions,...
- Scope where a symbol is visible (e.g. global, file, function, compound
statement,...)
- For variables,...: Size (amount of memory needed)
- For functions, variables,...: (relative) address
Intermediate Representation: Abstract Syntax Tree
How to construct an abstract syntax tree:
Create nodes of the syntax tree as attributes of the attributed grammar.
For example, rewrite this
exp: exp '+' term { $$ = $1 + $3; }
to this:
exp: exp '+' term
{ $$ = newnode(PLUS, $1, $3); }
(YYSTYPE
has to be changed)
Most parts of an abstract syntax tree are binary (two branches), but for
some constructs (e.g. arguments of a function), special treatment is
necessary.
(For very simple programming languages (e.g. Pascal) and simple
architectures (e.g. stack machine), it is possible to create code during
parsing and to avoid the creation of an abstract syntax tree.)
Semantic Analysis
- Mainly analysis and processing of type information:
- Check whether types match
- If necessary, add automatic type conversion (to syntax tree)
- For C and similar languages, relatively easy
- For object-oriented languages, has to consider inheritance,...
- Some languages (e.g. Haskell) use type inference
- Timing: When abstract syntax tree is constructed or just before or during
code generation
Type Equivalence
There are different ways to define type equivalence:
- Same name, same type (simple, but inconvenient for user)
- Same components, same type (complicated)
- For object-oriented languages, many choices exist
Example for C: type-equivalence.c (does
this program compile?)
Type Equivalence in Haskell
- Haskell: Functional programming language with strong theoretical
background
- Characteristics:
- No assignment (only initialization), no loops (only recursion)
- Lazy evaluation
- Type inference
- Example of type inference:
- Type of function
f
: (a, Char) ->
(a, [Char])
(function from a pair of a
and
Char
to a pair of a
and a list of
Char
s
- Type of function
g
: (Int, [b]) ->
Int
(function from a pair of Int
and a list
ofb
s to an Int
)
- What is the type of the function
h(x) := g(f(x))
?
Summary of this Lecture
- Error processing while parsing is important but difficult
- Symbol table and (abstract) syntax tree are the main intermediate
representations
- Semantic analysis is mainly concerned with types
Topic next week: Turing Machines
Glossary
- syntax error
- 構文エラー
- secondary error
- 二次エラー
- symbol table
- 名前表
- scope
- 有効範囲、スコープ
- compound statement
- 複文
- inheritance
- 継承
- type inference
- 型推論
- type equivalence
- 型の等価
- functional (programming) language
- 関数型 (プログラミング) 言語
- lazy evaluation
- 遅延評価