Principles of top-down parsing
(下向き構文解析の実装)
8th lecture, May 27, 2016
Language Theory and Compilers
http://www.sw.it.aoyama.ac.jp/2016/Compiler/lecture8.html
Martin J. Dürst
© 2005-16 Martin
J. Dürst 青山学院大学
Today's Schedule
- Remainders and summary from last lecture
- Top-down parsing
- Recursive descent parsing
- Implementation of recursive descent parsing
- How to deal with various problems in a grammar:
- Limitation of number of operations
- Priority
- Left/right associativity
- Left recursion
Remainders from Last Lecture
Summary of Last Lecture
flex
homework: Lexical analysis/regular expressions for
C
- Regular expression for C comments
- Grammars for various languages
- How to construct a grammar
- Results of parsing: Parse tree and abstract syntax tree
General Top-Down Parsing
- Create parse tree starting with the start symbol of the grammar
- Expand parse tree depth-first from the left
- If there is a choice (of rewriting rules), try each one in turn
- Once the parse tree reaches a terminal symbol, compare with input
- If there is a match, continue
- If there is no match, give up and try another choice
(backtracking)
Main Points of Backtracking
Backtracking may be very slow, but this can be improved:
- Change the grammar so that backtracking is reduced or eliminated
(ideally, the next token should be enough to select a single rewriting
rule)
- Remember intermediate results (packrat parser)
Example implementation: treetop (for Ruby)
Recursive Descent Parsing
- Create a function for each non-terminal symbol of the grammar
- In the function, proceed along the right side of the rewriting rule(s):
- For a terminal symbol, compare with input
- For a non-terminal symbol, call the corresponding function
- Use branching (
if
,...) for a choice (|
) in
the grammar
- Use repetition (
while
,...) for repetition in BNF
- Works well for hand-written parsers
- Reason for name: Recursive grammar rules
Example: A → variable '=' A | integer
(assignment
expression)
Recursive Descent Parsing: Simple Hand-Written Parser
Program files: scanner.h, scanner.c, parser1.c
How to complie: gcc scanner.c parser.c && ./a
Details of Recursive Descent Parsing: Lexical Analysis
(see scanner.c)
- Invariant:
- The next character to be looked at is always in
nextChar
(one-character lookahead)
- As soon as a character is processed, the next character is read into
nextChar
nextChar
is a global variable, but this can be changed to a
function parameter
- How to use from parser:
- Initialize with
initScanner
- Read tokens with
getNextToken
- Implementation of
getNextToken
:
- One-character tokens: Direct decision
- Multiple-character tokens: Decide on first character, read the rest
with a dedicated function
Details of Recursive Descent Parsing: Parsing
(see parser1.c)
- Invariant (same as for lexical analysis):
- The next token to be looked at is always in
nextToken
(one-token lookahead)
- As soon as a token is processed, the next token is read into
nextToken
nextToken
is a global variable, but this can be changed to a
function parameter
- Overall usage:
- Initialize using
initScanner
and
getNextToken
- Call the function corresponding to the start symbol of the grammar
(e.g.
Expression()
)
- Further process the returned value (abstract syntax tree or result of
evaluation)
Details of Recursive Descent Parsing: Non-Terminal Symbols
- Create a function for each non-terminal symbol of the grammar
- In the function, deal with all rewriting rules for this non-terminal
symbol
- For each non-terminal on the right-hand side, call the corresponding
function
- For each terminal on the right-hand side, compare with
nextToken
How to Deal with Left Recursion
Example of left recursion:
E → E '-' integer | integer
Wrong solution (change of associativity):
E → integer '-' E | integer
Correct solution:
E → integer EE
EE → '-' integer EE | ε
In (E)BNF:
E → integer {'-' integer}
Differences between Grammars and Regular Expressions
Grammar:
- Multiple rules
- Non-terminal symbols, derivation from left-hand side to right-hand
side
- For simple grammars, *, (), and | are not available
Regular Expression:
- Limited to one single rule
- No non-terminal symbols, only right-hand side
- For practical regular expressions, lots of metacharacters/functionality
besides *, (), and | available
A simple regular expression corresponds to a single rewriting rule in an
(BNF,...) grammar
Homework
Deadline: June 2, 2016 (Thursday), 19:00
Where to submit: Box in front of room O-529 (building O, 5th floor)
Format: A4 single page (using both sides is okay; NO cover page, staple in
top left corner if more than one page is necessary), printout (no
wrapping lines), name (kanji and kana) and student number in comment
at the top right
- Expand the top-down parser of parser1.c
to correctly deal with the four basic arithmetic operations.
(scanner.h
/c
do not change, so no need to submit
them)
- (bonus problem) Add more operations to the top-down parser.
(If you solve this problem, also submit the
scanner.h
/c
files, but only one
parser.c
file for both problems.)
- Bring your notebook computer to the next lecture. Check again that
flex
, bison
, make
, and
gcc
are installed.
Glossary
- recursive descent parsing
- 再帰的下向き構文解析
- depth-first
- 深さ優先
- backtracking
- バックトラック
- ambiguous grammar
- 曖昧な文法
- right associative
- 右結合
- invariant
- 不変条件
- left recursion
- 左再帰