Error Processing, Intermediate Representation, Semantic Analysis

(エラー処理、中間表現、意味解析)

11th lecture, June 17, 2016

Language Theory and Compilers

http://www.sw.it.aoyama.ac.jp/2014/Compiler/lecture11.html

Martin J. Dürst

Today's Schedule

Summary and leftovers of last lecture
Additional information and hints for homework
Repetition: Grammar patterns
Error processing
Intermediate Representations
Semantic analysis

Leftovers from Last Week

Summary of Last Week

There are many different orders of derivation, in particular leftmost derivation and rightmost derivation
- Recursive descent parsing corresponds to leftmost derivation
- Bottom-up parsing (LALR parsing) uses the reverse order of rightmost derivation
LALR parsing operations: shift/reduce/goto/accept
- shift pushes a token and a state onto the stack
- reduce converts a number of tokens and nonterminals at the top of the stack to a single nonterminal
These operations can be checked in the .output file produced by bison -v
and when debugging by setting #define YYDEBUG 1
For ambiguous grammars, bison reports shift/reduce conflicts

Hints for Homework

Deadline: June 23, 2016 (Thursday in two weeks), 19:00

Create a calculator for complex numbers (details see last week)

Now is your chance to ask questions!

Additional hints:

For calculations, use the code that you wrote last year
To deal with the restrictions inside []: Use different (but similar) nonterminals and grammar rules for outside [] and inside []

Example of inputs and outputs (not very complete): test.in; test.check

Grammar Patterns: Repetition

One or more times:

items: items item
     | item 
;

Zero or more times:

items: items item
     |
;

Instead of "items item", "item items" is also possible, but bison's stack may become a problem

Grammar Patterns: Associativity

Left associative:

big_exp: big_exp left_associative_operator small_exp
   | small_exp
;

Right associative:

big_exp: small_exp right_associative_operator big_exp
   | small_exp
;

Grammar Patterns: Priority

(priority is small_exp > middle_exp > big_exp; assuming left associative)

big_exp: big_exp operator middle_exp
   | middle_exp
;
middle_exp: middle_exp operator small_exp
   | small_exp
;

Grammar Patterns: Parentheses

small_exp: open_paren big_exp close_paren
   | literal 
;

Processing Syntax Errors

Why is error processing difficult
Requirements for error processing
Techniques for error processing

Why is Error Processing Difficult

To make programs shorter, 'unnecessary' symbols should be avoided
For each correct program, there are many different erroneous programs
It is difficult for programs to distinguish between errors easily made by humans and errors rarely made by humans
Parsing is based on language theory, but there is not much theory for error processing

Requirements for Error Processing

Output error messages that are easy to understand
Find as many actual errors as possible
Avoid secondary errors
Do not slow down processing of correct programs
Do not make the compiler much more complicated

Techniques for Error Processing

Throw away tokens until finding a token that matches the grammar (panic mode)
Try to add or exchange a small number of tokens
Add productions to the grammar that catch errors (error productions)
Search for the correct program closest to the input

Error Processing in `bison`

Rules may contain a special error token
Example:
```
statement: ... SEMICOLON { ... }
         | error SEMICOLON { yyerror("Statement error.\n"); yyerrok; }
```
(yyerror is called automatically, therefore this call may be unnecessary)
If there is an error, bison ignores all tthe tokens and nonterminals before the closest error token
The tokens in the rule after the error token are also ignored

Compilation Stages

Lexical analysis
Parsing (syntax analysis)
Semantic analysis
Optimization (or 5)
Code generation (or 4)

Intermediate Representation: Symbol Table

Functionality provided:
- Search of symbols
- Registration and removal (for local variables) of symbols
- Management of data for each symbol
Main points:
- Frequent use, large number of symbols → efficiency is important
- The same symbol may be used for different things in different contexts → distinction by scope and type is important

Data Stored by Symbol Table

Kind of symbol (variable, argument, function, type,...)
Locations of declarations/definition/use
Type for variables, functions,...
Scope where a symbol is visible (e.g. function, compound statement,...)
For variables,...: Size (amount of memory needed)
For functions, variables,...: (relative) address

Intermediate Representation: Abstract Syntax Tree

How to construct an abstract syntax tree:
Create nodes of the syntax tree as attributes of the attributed grammar.

For example, rewrite this

exp: exp '+' term { $$ = $1 + $3; }

to this:

exp: exp '+' term
        { $$ = newnode(PLUS, $1, $3); }

(YYSTYPE has to be changed)

Most parts of an abstract syntax tree are binary (two branches), but for some constructs (e.g. arguments of a function), special treatment is necessary.

(For very simple programming languages (e.g. Pascal) and simple architectures (e.g. stack machine), it is possible to create code during parsing and to avoid the creation of an abstract syntax tree.)

Semantic Analysis

Mainly analysis and processing of type information:
- Check whether types match
- If necessary, add automatic type conversion (to syntax tree)
- For C and similar languages, relatively easy
- For object-oriented languages, has to consider inheritance,...
- Some languages (e.g. Haskell) use type inference
Timing: When abstract syntax tree is constructed or just before or during code generation

Type Equivalence

There are different ways to define type equivalence:

Same name, same type (simple, but inconvenient for user)
Same components, same type (complicated)
For object-oriented languages, many choices exist

Example for C: type-equivalence.c (does this program compile?)

Type Equivalence in Haskell

Haskell: Functional programming language with strong theoretical background
Characteristics:
- No assignment (only initialization), no loops (only recursion)
- Lazy evaluation
- Type inference
Example of type inference:
- Type of function f: (a, Char) -> (a, [Char])
  (function from a pair of a and Char to a pair of a and a list of Chars
- Type of function g: (Int, [b]) -> Int
  (function from a pair of Int and a list ofbs to an Int)
- What is the type of the function h(x) := g(f(x))?

Topic Next Week: Turing Machines

Glossary

syntax error: 構文エラー
secondary error: 二次エラー
symbol table: 名前表
compound statement: 複文
inheritance: 継承
type inference: 型推論
type equivalence: 型の等価
functional (programming) language: 関数型 (プログラミング) 言語
lazy evaluation: 遅延評価