Error Processing, Intermediate Representation, Semantic Analysis

(エラー処理、中間表現、意味解析)

11th lecture, June 28, 2019

Language Theory and Compilers

http://www.sw.it.aoyama.ac.jp/2019/Compiler/lecture11.html

Martin J. Dürst

Today's Schedule

Summary and leftovers of last lecture
Additional information and hints for homework
Error processing
Intermediate Representations
Semantic analysis

Leftovers from Last Lecture

Summary of Last Lecture

There are many different orders of derivation, in particular leftmost derivation and rightmost derivation
- Recursive descent parsing corresponds to leftmost derivation
- Bottom-up parsing (LALR parsing) uses the reverse order of rightmost derivation
LALR parsing operations: shift/reduce/goto/accept
- shift pushes a token and a state onto the stack
- reduce converts a number of tokens and nonterminals at the top of the stack to a single nonterminal
These operations can be checked in the .output file produced by bison -v
and when debugging by setting #define YYDEBUG 1
For ambiguous grammars, bison reports shift/reduce and reduce/reduce conflicts

Hints for Homework

Deadline: July 4, 2018 (Thursday in one week), 19:00

Change the simple calculator of calc.y to a calculator that can handle integers and rationals.

Statements are separated with ;.
Rationals are expressed as [numerator, denominator] .
Inside [], division is not allowed. Make sure this is checked by the grammar.
Nesting of [] is not allowed. Make sure this is checked by the grammar.
Calculations are exact, not using floating point numbers.
Print out the result of each statement.
Results are given as irreducible fractions, with a minus sign on the numerator if applicable.
Example result: Result is -53/17
Use the grammar to define priorities and associativities (do NOT use %left, %right,...).

Additional hints:

(also check hints in last lecture)
How to deal with multiple types (integers, rationals):
- Make YYSTYPE a struct that can represent rationals and integers (in both .lex and .y)
- Eliminate impossible calculations with the grammar (ex: division inside rationals)
Start from calc, but make sure you change the file names (incl. makefile)
Example input: rational.in.txt; example output: rational.check.txt

Now is your chance to ask questions!

Processing Syntax Errors

Why is error processing difficult
Requirements for error processing
Techniques for error processing

Why is Error Processing Difficult

To make programs shorter, 'unnecessary' symbols should be avoided
For each correct program, there are many different erroneous programs
It is difficult for programs to distinguish between errors easily made by humans and errors rarely made by humans
Parsing is based on language theory, but there is not much theory for error processing

Requirements for Error Processing

Output error messages that are easy to understand
Find as many actual errors as possible
Avoid secondary errors
Do not slow down processing of correct programs
Do not make the compiler much more complicated

Techniques for Error Processing

Throw away tokens until finding a token that matches the grammar (panic mode)
Add productions to the grammar that catch errors (error productions)
Try to add or exchange a small number of tokens
Search for a correct program closest to the input

Error Processing in `bison`

Rules may contain a special error token
Example:
```
statement: ... SEMICOLON { ... }
         | error SEMICOLON { yyerror("Statement error.\n"); yyerrok; }
```
(yyerror is called automatically, therefore this call may be unnecessary)
If there is an error, bison ignores all the tokens and nonterminals before the closest error token
The tokens in the rule after the error token are also ignored

Compilation Stages

Lexical analysis
Parsing (syntax analysis)
Semantic analysis
Optimization (or 5)
Code generation (or 4)

Intermediate Representation: Symbol Table

Functionality provided:
- Search of symbols (identifiers)
- Registration and removal (for local variables) of symbols
- Management of data for each symbol
Main points:
- Frequent use, large number of symbols
  → efficiency is important
- The same symbol may be used for different things in different contexts
  → distinction by scope and type is important

Example of Scope

extern int a; // declaration only, global scope
static int b; // file scope
int f (int a) // f: file scope; a: function scope
{
    int a;    // function scope
    static int b; // function scope, but persists across function calls
    while (...) {
        int a;    // block scope
    }
}

Data Stored by Symbol Table

Kind of symbol (variable, argument, function, type,...)
Locations of declarations/definition/use
Type for variables, functions,...
Scope where a symbol is visible (e.g. global, file, function, compound statement,...)
For variables,...: Size (amount of memory needed)
For functions, variables,...: (relative) address

Intermediate Representation: Abstract Syntax Tree

How to construct an abstract syntax tree:
Create nodes of the syntax tree as attributes of the attributed grammar.

For example, rewrite this

exp: exp '+' term { $$ = $1 + $3; }

to this:

exp: exp '+' term
        { $$ = newnode(PLUS, $1, $3); }

(YYSTYPE has to be changed)

Most parts of an abstract syntax tree are binary (two branches), but for some constructs (e.g. arguments of a function), special treatment is necessary.

(For very simple programming languages (e.g. Pascal) and simple architectures (e.g. stack machine), it is possible to create code during parsing and to avoid the creation of an abstract syntax tree.)

Semantic Analysis

Mainly analysis and processing of type information:
- Check whether types match
- If necessary, add automatic type conversion (to syntax tree)
- For C and similar languages, relatively easy
- For object-oriented languages, has to consider inheritance,...
- Some languages (e.g. Haskell) use type inference
Timing: When abstract syntax tree is constructed or just before or during code generation

Type Equivalence

There are different ways to define type equivalence:

Same name, same type (simple, but inconvenient for user)
Same components, same type (complicated)
For object-oriented languages, many choices exist

Example for C: type-equivalence.c (does this program compile?)

Type Equivalence in Haskell

Haskell: Functional programming language with strong theoretical background
Characteristics:
- No assignment (only initialization), no loops (only recursion)
- Lazy evaluation
- Type inference
Example of type inference:
- Type of function f: (a, Char) -> (a, [Char])
  (function from a pair of a and Char to a pair of a and a list of Chars
- Type of function g: (Int, [b]) -> Int
  (function from a pair of Int and a list ofbs to an Int)
- What is the type of the function h(x) := g(f(x))?

Summary of this Lecture

Error processing while parsing is important but difficult
Symbol table and (abstract) syntax tree are the main intermediate representations
Semantic analysis is mainly concerned with types

Topic next week: Turing Machines

Glossary

syntax error: 構文エラー
secondary error: 二次エラー
symbol table: 名前表
scope: 有効範囲、スコープ
compound statement: 複文
inheritance: 継承
type inference: 型推論
type equivalence: 型の等価
functional (programming) language: 関数型 (プログラミング) 言語
lazy evaluation: 遅延評価