Principles of top-down parsing
(下向き構文解析の実装)
8th lecture, June 7, 2019
Language Theory and Compilers
http://www.sw.it.aoyama.ac.jp/2019/Compiler/lecture8.html
Martin J. Dürst
© 2005-19 Martin
J. Dürst 青山学院大学
Today's Schedule
- Leftovers, summary, and homework for last lecture
- Top-down parsing
- Recursive descent parsing
- Implementation of recursive descent parsing
- How to deal with various problems in a grammar:
- Limitation of number of operations
- Priority
- Left/right associativity
- Left recursion
Leftovers from Last Lecture
Summary of Last Lecture
flex
homework: Lexical analysis/regular expressions for
C
- Regular expression for C comments
- Grammars for various languages
- How to construct a grammar
- Results of parsing: Parse tree and abstract syntax tree
Comment about Grammars Collected Last Week
- Many students submitted descriptions of syntax, not grammars
- These students received less points
Last Week's Homework 1
都合により削除
Last Week's Homework 2
都合により削除
Last Week's Homework 3
都合により削除
About Ambiguous Grammars
- Grammars that may produce more than one parse tree for the same input are
called ambiguous grammars
- Sometimes, it may seem okay to allow an ambiguous grammar (e.g.
mathematically, a+(b+c) =
(a+b)+c). But this will not work for
overflows, and it will not work for non-commutative operators (e.g.
a-(b-c) ≠
(a-b)-c).
- Some grammars can be changed to remove ambiguity
- There is no general algorithm to remove ambiguity; there is also no
algorithm to decide whether removal is possible for a given grammar
- Ambiguous grammars are okay when only defining the syntax of a language,
but are not suited for a programming language or a data format
- Whether a grammar is ambiguous and whether its language requires a
nondeterministic pushdown automaton for recognition are separate
problems
(example: palindrome)
General Top-Down Parsing
- Create parse tree starting with the start symbol of the grammar
- Expand parse tree depth-first from the left
- If there is a choice (of rewriting rules), try each one in turn
- Once the parse tree reaches a terminal symbol, compare with input
- If there is a match, continue
- If there is no match, give up and try another choice
(backtracking)
Main Points of Backtracking
Backtracking tries all possible pathways (similar to finding exit in a
labyrinth without map)
Backtracking may be very slow, but this can be improved:
- Change the grammar so that backtracking is reduced or eliminated
(ideally, the next token should be enough to select a single rewriting
rule)
- Use lookahead (check some more tokens) to eliminate some
choices
- Remember intermediate results (packrat parser)
- Similar to making marks in a labyrinth
- Example implementation: treetop (for Ruby)
Recursive Descent Parsing
- Implementation of top-down parsing, easy to write by hand
- Create a function for each non-terminal symbol of the grammar
- In the function, proceed along the right side of the rewriting rule(s):
- For a terminal symbol, compare with input
- For a non-terminal symbol, call the corresponding function
- Use branching (
if
,...) for a choice (|
) in
the grammar
- Use repetition (
while
,...) for repetition in BNF
- Reason for name: Recursive grammar rules
Example: A → variable '=' A | integer
(assignment
expression)
Recursive Descent Parsing: Simple Hand-Written Parser
Program files: scanner.h, scanner.c, parser1.c
How to complie: gcc scanner.c parser.c && ./a
Details of Recursive Descent Parsing: Lexical Analysis
(see scanner.c)
- Invariant:
- The next character is always in
nextChar
(one-character
lookahead)
- As soon as a character is processed, the next character is read into
nextChar
nextChar
is a global variable (can be changed to a function
parameter)
- How to use from parser:
- Initialize with
initScanner
- Read tokens with
getNextToken
- Implementation of
getNextToken
:
- One-character tokens: Direct decision
- Multiple-character tokens: Decide on first character, read the rest
with a dedicated function
Details of Recursive Descent Parsing: Parsing
(see parser1.c)
- Invariant (same as for lexical analysis):
- The next token to be looked at is always in
nextToken
(one-token lookahead)
- As soon as a token is processed, the next token is read into
nextToken
nextToken
is a global variable, but this can be changed to a
function parameter
- Overall usage:
- Initialize using
initScanner
and
getNextToken
- Call the function corresponding to the start symbol of the grammar
(e.g.
Expression()
)
- Further process the returned value (abstract syntax tree or result of
evaluation)
Details of Recursive Descent Parsing: Non-Terminal Symbols
- Create a function for each non-terminal symbol of the grammar
- In the function, deal with all rewriting rules for this non-terminal
symbol
- For each non-terminal on the right-hand side, call the corresponding
function
- For each terminal on the right-hand side, compare with
nextToken
How to Deal with Left Recursion
Example of left recursion:
E → E '-' integer | integer
Wrong solution (change of associativity):
E → integer '-' E | integer
Correct solution:
E → integer Econtinued
Econtinued → '-' integer Econtinued | ε
In (E)BNF:
E → integer {'-' integer}
Differences between Grammars and Regular Expressions
Grammar:
- Multiple rules
- Non-terminal symbols, derivation from left-hand side to right-hand
side
- For simple grammars, *, (), and | are not available
Regular Expression:
- Limited to one single rule
- No non-terminal symbols, only right-hand side
- For practical regular expressions, lots of metacharacters/functionality
besides *, (), and | available
A simple regular expression corresponds to a single rewriting rule in an
(BNF,...) grammar
Homework
Deadline: June 21, 2017 (Thursday) June 13, 2019 (Thursday)
, 19:00
Where to submit: Box in front of room O-529 (building O, 5th floor)
Format: A4 double-sided printout of parser program. Stapled in upper left if
more than one page, no cover page, no wrapping lines, legible font size,
non-proportional font, portrait (not landscape), formatted (indents,...) for
easy visibility, name (kanji and kana) and student number as a comment at the
top
Collaboration: The same rules as for Computer Practice I (計算機実習 I)
apply
- Expand the top-down parser of parser1.c
to correctly deal with the four basic arithmetic operations.
(scanner.h
/c
do not change, so no need to submit
them)
- (bonus problem) Add more operations to the top-down parser, and/or deal
with parentheses.
(If you solve this problem, also submit the
scanner.h
/c
files, but only one
parser.c
file for both problems.)
- Bring your notebook computer to the next lecture. Check again that
flex
, bison
, make
, and
gcc
are installed.
Glossary
- ambiguous grammar
- 曖昧な文法
- recursive descent parsing
- 再帰的下向き構文解析
- depth-first
- 深さ優先
- lookahead
- 先読み
- backtracking
- バックトラック
- labyrinth
- 迷路
- right associative
- 右結合
- invariant
- 不変条件
- left recursion
- 左再帰