Principles of top-down parsing
(下向き構文解析の実装)
8th lecture, May 26, 2017
Language Theory and Compilers
http://www.sw.it.aoyama.ac.jp/2017/Compiler/lecture8.html
Martin J. Dürst
© 2005-17 Martin
J. Dürst 青山学院大学
Today's Schedule
- Remainders, summary, and homework for last lecture
- Top-down parsing
- Recursive descent parsing
- Implementation of recursive descent parsing
- How to deal with various problems in a grammar:
- Limitation of number of operations
- Priority
- Left/right associativity
- Left recursion
Remainders from Last Lecture
Summary of Last Lecture
flex
homework: Lexical analysis/regular expressions for
C
- Regular expression for C comments
- Grammars for various languages
- How to construct a grammar
- Results of parsing: Parse tree and abstract syntax tree
Last Week's Homework 1
Last Week's Homework 2
Last Week's Homework 3
General Top-Down Parsing
- Create parse tree starting with the start symbol of the grammar
- Expand parse tree depth-first from the left
- If there is a choice (of rewriting rules), try each one in turn
- Once the parse tree reaches a terminal symbol, compare with input
- If there is a match, continue
- If there is no match, give up and try another choice
(backtracking)
Main Points of Backtracking
Backtracking tries out all alternative pathways (similar to finding exit in
a labyrinth without map)
Backtracking may be very slow, but this can be improved:
- Change the grammar so that backtracking is reduced or eliminated
(ideally, the next token should be enough to select a single rewriting
rule)
- Use lookahead (check some more tokens) to eliminate some
choices
- Remember intermediate results (packrat parser)
Example implementation: treetop (for Ruby)
Recursive Descent Parsing
- Create a function for each non-terminal symbol of the grammar
- In the function, proceed along the right side of the rewriting rule(s):
- For a terminal symbol, compare with input
- For a non-terminal symbol, call the corresponding function
- Use branching (
if
,...) for a choice (|
) in
the grammar
- Use repetition (
while
,...) for repetition in BNF
- Works well for hand-written parsers
- Reason for name: Recursive grammar rules
Example: A → variable '=' A | integer
(assignment
expression)
Recursive Descent Parsing: Simple Hand-Written Parser
Program files: scanner.h, scanner.c, parser1.c
How to complie: gcc scanner.c parser.c && ./a
Details of Recursive Descent Parsing: Lexical Analysis
(see scanner.c)
- Invariant:
- The next character to be looked at is always in
nextChar
(one-character lookahead)
- As soon as a character is processed, the next character is read into
nextChar
nextChar
is a global variable, but this can be changed to a
function parameter
- How to use from parser:
- Initialize with
initScanner
- Read tokens with
getNextToken
- Implementation of
getNextToken
:
- One-character tokens: Direct decision
- Multiple-character tokens: Decide on first character, read the rest
with a dedicated function
Details of Recursive Descent Parsing: Parsing
(see parser1.c)
- Invariant (same as for lexical analysis):
- The next token to be looked at is always in
nextToken
(one-token lookahead)
- As soon as a token is processed, the next token is read into
nextToken
nextToken
is a global variable, but this can be changed to a
function parameter
- Overall usage:
- Initialize using
initScanner
and
getNextToken
- Call the function corresponding to the start symbol of the grammar
(e.g.
Expression()
)
- Further process the returned value (abstract syntax tree or result of
evaluation)
Details of Recursive Descent Parsing: Non-Terminal Symbols
- Create a function for each non-terminal symbol of the grammar
- In the function, deal with all rewriting rules for this non-terminal
symbol
- For each non-terminal on the right-hand side, call the corresponding
function
- For each terminal on the right-hand side, compare with
nextToken
How to Deal with Left Recursion
Example of left recursion:
E → E '-' integer | integer
Wrong solution (change of associativity):
E → integer '-' E | integer
Correct solution:
E → integer EE
EE → '-' integer EE | ε
In (E)BNF:
E → integer {'-' integer}
Differences between Grammars and Regular Expressions
Grammar:
- Multiple rules
- Non-terminal symbols, derivation from left-hand side to right-hand
side
- For simple grammars, *, (), and | are not available
Regular Expression:
- Limited to one single rule
- No non-terminal symbols, only right-hand side
- For practical regular expressions, lots of metacharacters/functionality
besides *, (), and | available
A simple regular expression corresponds to a single rewriting rule in an
(BNF,...) grammar
Homework
Deadline: June 8, 2017 (Thursday), 19:00
Where to submit: Box in front of room O-529 (building O, 5th floor)
Format: A4 double-sided printout of parser program. Stapled in upper left if
more than one page, no cover page, no wrapping lines, legible font size,
non-proportional font, portrait (not landscape), formatted (indents,...) for
easy visibility, name (kanji and kana) and student number as a comment at the
top right
Collaboration: The same rules as for Computer Practice I (計算機実習 I)
apply
- Expand the top-down parser of parser1.c
to correctly deal with the four basic arithmetic operations.
(scanner.h
/c
do not change, so no need to submit
them)
- (bonus problem) Add more operations to the top-down parser, and/or deal
with parentheses.
(If you solve this problem, also submit the
scanner.h
/c
files, but only one
parser.c
file for both problems.)
- Bring your notebook computer to the next lecture. Check again that
flex
, bison
, make
, and
gcc
are installed.
Glossary
- recursive descent parsing
- 再帰的下向き構文解析
- depth-first
- 深さ優先
- lookahead
- 先読み
- backtracking
- バックトラック
- labyrinth
- 迷路
- ambiguous grammar
- 曖昧な文法
- right associative
- 右結合
- invariant
- 不変条件
- left recursion
- 左再帰