Use of Tools for Parsing
(yacc 系ツールの原理)
10th lecture, June 10, 2016
Language Theory and Compilers
http://www.sw.it.aoyama.ac.jp/2016/Compiler/lecture10.html
Martin J. Dürst
© 2005-16 Martin
J. Dürst 青山学院大学
Today's Schedule
- About last week's homework
- Summary of last lecture
- How
bison
works:
- Different orders of derivation
- LALR parsing
- How to read the
.output
file
- How to debug
bison
grammars
Last Week's Homework
Summary of Last Lecture
- Creating a program with
bison
and flex
needs
many steps, so using make
is important
- The input format for
bison
is very similar to the input
format for flex
, but there are also some differences
bison
uses attribute grammars to calculate the result of
parsing
- The attributes are referenced as
$$
, $1
,
$2
,... in the C program fragments
- Priority and associativity of operators are expressed in the rewriting
rules of the grammar
How to Express Priorities
- Use a separate nonterminal symbol for each priority level
- Write the grammar starting with the lowest priority (outside)
- On the right hand side of the rewriting rule for a given priority's
nonterminal,
use nonterminals of the same ore one level higher priority
- How to select names for nonterminals:
- Use mathematical terms (用語): term (項), factor (因子)
- Use the type of operator, or one representative operator
(shift_expression, mulExpression,...)
How to Express Associativity
- Left associative:
- Use the same nonterminal on the left hand side of the rewriting rule
and on the right hand side to the left of the operator
- Use the one level higher priority nonterminal to the right
of the operator
- (unless this operator is required) Also create a rewriting rule with
just the one level higher nonterminal as the right hand side
- Right associative:
- Exchange the nonterminals to the right and the left of the
operator
How to Express Repetition (Lists)
- Example: A list of statements (
statementList
or
statements
)
- Two rewriting rules are needed
- Base rule:
- Repetition of 0 or more times: A rewriting rule with an empty right
hand side
- Repetition of 1 or more times: A rewriting rule with a single element
(e.g.
statement
) of the list on the right hand side
- Inductive rule:
- Right hand side uses both list and single element nonterminals
- If associativity is important, it determines the order of the two
nonterminals
- If associativity is not important, there are two choices:
- List first: Left recursion; advantage: smaller stack
- Element first: Right recursion
- Examples:
Order of Derivation: Leftmost and Rightmost Derivation
With leftmost derivation, always the leftmost nonterminal
in the syntax tree is expanded
With rightmost derivation, always the rightmost
nonterminal in the syntax tree is expanded
Simple example grammar:
E → E '+' T
| T
T → integer
Example of input: 5 + 7 + 3
Derivation Choices
Different choices may:
- Generate different words: This is necessary to be able to process
different inputs
- Generate the same word, but with different syntax trees: Ambiguous
grammar, needs to be avoided
- Generate the same syntax tree, but in different orders
(leftmost/rightmost/... derivation): Different parsing algorithm
Kinds of Analysis Methods
- LL: Read input from the left, use leftmost derivation (used in top-down
parsing)
- LR: Read input from the right, use rightmost derivation (in reverse
order)
- LL(1): LL, with one token lookahead
- LR(1): LR, with one token lookahead
- LALR: A kind of LR (1), used widely in
yacc
and
bison
The labels are also used for grammars:
"This grammar is LL(1)" (meaning: this grammar can be used with an LL(1)
parser)
How to Observe and Debug bison
bison -v
creates a file with many interesting details
(calc.y → calc.output)
#define YYDEBUG 1
switches on debugging
Understanding bison
: The .output
File
bison -v
creates a file with extension .output
,
containing the following interesting details:
- [Problems: Unused terminal symbols, conflicts]
- Grammar: Numbered rewriting rules; rule number 0 is
$accept:
start symbol $end
)
- Terminals: Numbered terminal symbols; numbers are ASCII codes or
>256)
- Nonterminals, with numbers of rules where they appear
- States, with the following information for each state:
- Rewriting rules (
.
shows current position)
- Terminal symbols (or $default) and the action (shift, reduce, goto)
if this symbol is the next symbol in the input
- Nonterminal symbols: goal state of transition after reduction
Understanding bison
: Debuging
#define YYDEBUG 1
switches on debugging
The output shows how bison
works:
- A (pushdown) stack is used to store:
- States (of an automaton)
- Already read terminals and reduced nonterminals
- The automaton and the next input token decide the action to be taken
- There are three possible actions:
- shift: Read a token and put it on the stack (together with a
state)
- reduce: Convert some tokens and/or nonterminals on the stack to a
single nonterminal using a rewriting rule
(a reduce action is always followed by a goto to another state)
- accept: Stop processing and accept the input
Conflicts and Ambiguous Grammars
- When running
bison
, it may show some conflicts:
- shift/reduce conflicts: Both shift and reduce are possible
- reduce/reduce conflicts: There is more than one way to reduce
bison
just chooses one of the selections:
- If this is the right selection, we may be fine
(but we may want to fix the grammar anyway)
- If this is the wrong selection, we have to fix the grammar
- Grammar example:
E → E '-' E | integer
- For
5 - 3 - 7
, this grammar allows two interpretations:
(5-3) - 7
and 5 - (3-7)
Grammar of bison
Rewriting Rules
rewritingRule → nonterminalSymbol ":
" rightHandList
";
"
rightHandList → rightHand | rightHand "|
" rightHandList
rightHand → symbolList "{
" CFragment "}
"
symbolList → symbol | symbol symbolList
symbol → nonterminalSimbol | terminalSymbol
How to Combine flex
and bison
- In the
.y
file, list all token types:
%token NUM PLUS ASTERISK
...
- In the
.y
file, define the type of the attributes
#define YYSTYPE int
- In the
.lex
file, define one or more rules for each token
type
- In the
.y
file, define the rewriting rules of the
grammar
- In the
.y
file, write the program fragments to calculate
attribute values
- Process (with flex/bison), compile, and test
Advantages and Problems of Bottom-Up Parsing
- Advantages:
- No fear of left recursion
- Wider range of grammars
- Automatic creation of parser
- Problems:
- Very hard to create parser by hand
- Ambiguity needs attention
Homework
Deadline: June 23, 2016 (Thursday in two weeks), 19:00
Prepare questions so that you can ask them in next week's lecture!
Expand the simple calculator of calc.y to a calculator for complex numbers.
Immaginary numbers are expressed as 5i
, complex numbers as
[realPart, immaginaryPart]
. Design your grammar so that inside
[]
, real number calculations are allowed, but immaginary or
complex numbers (e.g. 5i
) are disallowed. Example of input: test.in
Express priorities and associativity directly in the grammar
(%left
, %right
,... are forbidden).
Where to submit: Box in front of room O-529 (building O, 5th floor)
Submit the files complex.lex
and complex.y
, A4
using BOTH sides (↓↓, not ↓↑); NO cover page, staple in top left corner
for more than one page, printout (non-proportional font, no wrapping
lines), name (kanji and kana) and student number in comment at the top
right of the first page.
Hints for Homework
- Start from the calc files, but change the file names (including in the
makefile
)
- Change
YYSTYPE
so that it can represent a complex number (in
both .lex
and .y
)
- Define additional tokens in
.y
- Write rules for additional tokens in
.lex
- Convert processing of numbers in
.lex
so that it works with
floating point numbers
- If you have a shift/reduce or reduce/reduce conflict:
- Check the
.output
file
- Check using different inputs
- Expand your grammar little by little, always carefully testing
- Build up your own test file, and expand it together with expansions to
the grammar
- Create tests that check important aspects of the grammar (priorities,
associativity, errors)
- Save test outputs and use for automatic comparison
Glossary
- unary (operator)
- 単項 (演算子)
- leftmost derivation
- 最左導出
- rightmost derivation
- 最右導出
- reverse order
- 逆順
- lookahead
- 先読み
- non-proportional font
- 等幅のフォント