Principles of top-down parsing
(下向き構文解析の実装) 
8th lecture, June 7, 2019 
Language Theory and Compilers 
http://www.sw.it.aoyama.ac.jp/2019/Compiler/lecture8.html
Martin J. Dürst

© 2005-19 Martin
J. Dürst 青山学院大学
Today's Schedule
  - Leftovers, summary, and homework for last lecture
- Top-down parsing
- Recursive descent parsing
- Implementation of recursive descent parsing
- How to deal with various problems in a grammar:
    
      - Limitation of number of operations
- Priority
- Left/right associativity
- Left recursion
 
 
Leftovers from Last Lecture
 
Summary of Last Lecture
  - flexhomework: Lexical analysis/regular expressions for
  C
- Regular expression for C comments
- Grammars for various languages
- How to construct a grammar
- Results of parsing: Parse tree and abstract syntax tree
 
Comment about Grammars Collected Last Week
  - Many students submitted descriptions of syntax, not grammars
- These students received less points
 
Last Week's Homework 1
都合により削除
 
Last Week's Homework 2
都合により削除
 
Last Week's Homework 3
都合により削除
 
About Ambiguous Grammars
  - Grammars that may produce more than one parse tree for the same input are
    called ambiguous grammars
- Sometimes, it may seem okay to allow an ambiguous grammar (e.g.
    mathematically, a+(b+c) =
    (a+b)+c). But this will not work for
    overflows, and it will not work for non-commutative operators (e.g.
    a-(b-c) ≠
    (a-b)-c).
- Some grammars can be changed to remove ambiguity
- There is no general algorithm to remove ambiguity; there is also no
    algorithm to decide whether removal is possible for a given grammar
- Ambiguous grammars are okay when only defining the syntax of a language,
    but are not suited for a programming language or a data format
- Whether a grammar is ambiguous and whether its language requires a
    nondeterministic pushdown automaton for recognition are separate
    problems
 (example: palindrome)
 
General Top-Down Parsing
  - Create parse tree starting with the start symbol of the grammar
- Expand parse tree depth-first from the left
- If there is a choice (of rewriting rules), try each one in turn
 
- Once the parse tree reaches a terminal symbol, compare with input
    
      - If there is a match, continue
- If there is no match, give up and try another choice
      (backtracking)
 
 
Main Points of Backtracking
Backtracking tries all possible pathways (similar to finding exit in a
labyrinth without map)
Backtracking may be very slow, but this can be improved:
  - Change the grammar so that backtracking is reduced or eliminated
 (ideally, the next token should be enough to select a single rewriting
  rule)
- Use lookahead (check some more tokens) to eliminate some
  choices
- Remember intermediate results (packrat parser)
    
      - Similar to making marks in a labyrinth
- Example implementation: treetop (for Ruby)
 
 
Recursive Descent Parsing
  - Implementation of top-down parsing, easy to write by hand
- Create a function for each non-terminal symbol of the grammar
- In the function, proceed along the right side of the rewriting rule(s):
    
      - For a terminal symbol, compare with input
- For a non-terminal symbol, call the corresponding function
- Use branching (if,...) for a choice (|) in
        the grammar
- Use repetition (while,...) for repetition in BNF
 
- Reason for name: Recursive grammar rules
 Example:A → variable '=' A | integer(assignment
    expression)
 
 
Recursive Descent Parsing: Simple Hand-Written Parser
Program files: scanner.h, scanner.c, parser1.c
How to complie: gcc scanner.c parser.c && ./a
 
Details of Recursive Descent Parsing: Lexical Analysis
(see scanner.c)
  - Invariant:
    
      - The next character is always in nextChar(one-character
        lookahead)
- As soon as a character is processed, the next character is read into
        nextChar
 
- nextCharis a global variable (can be changed to a function
    parameter)
- How to use from parser:
    
      - Initialize with initScanner
- Read tokens with getNextToken
 
- Implementation of getNextToken:
      - One-character tokens: Direct decision
- Multiple-character tokens: Decide on first character, read the rest
        with a dedicated function
 
 
Details of Recursive Descent Parsing: Parsing
(see parser1.c)
  - Invariant (same as for lexical analysis):
    
      - The next token to be looked at is always in nextToken(one-token lookahead)
- As soon as a token is processed, the next token is read into
        nextToken
 
- nextTokenis a global variable, but this can be changed to a
    function parameter
- Overall usage:
    
      - Initialize using initScannerandgetNextToken
- Call the function corresponding to the start symbol of the grammar
        (e.g. Expression())
- Further process the returned value (abstract syntax tree or result of
        evaluation)
 
 
Details of Recursive Descent Parsing: Non-Terminal Symbols
  - Create a function for each non-terminal symbol of the grammar
- In the function, deal with all rewriting rules for this non-terminal
    symbol
- For each non-terminal on the right-hand side, call the corresponding
    function
- For each terminal on the right-hand side, compare with
    nextToken
 
How to Deal with Left Recursion
Example of left recursion:
E → E '-' integer | integer
Wrong solution (change of associativity):
E → integer '-' E | integer
Correct solution:
E → integer Econtinued
Econtinued → '-' integer Econtinued | ε
In (E)BNF:
E → integer {'-' integer}
 
Differences between Grammars and Regular Expressions
Grammar:
  - Multiple rules
- Non-terminal symbols, derivation from left-hand side to right-hand
  side
- For simple grammars, *, (), and | are not available
Regular Expression:
  - Limited to one single rule
- No non-terminal symbols, only right-hand side
- For practical regular expressions, lots of metacharacters/functionality
    besides *, (), and | available
A simple regular expression corresponds to a single rewriting rule in an
(BNF,...) grammar
 
Homework
Deadline: June 21, 2017 (Thursday) June 13, 2019 (Thursday), 19:00
Where to submit: Box in front of room O-529 (building O, 5th floor)
Format: A4 double-sided printout of parser program. Stapled in upper left if
more than one page, no cover page, no wrapping lines, legible font size,
non-proportional font, portrait (not landscape), formatted (indents,...) for
easy visibility, name (kanji and kana) and student number as a comment at the
top
Collaboration: The same rules as for Computer Practice I (計算機実習 I)
apply
  - Expand the top-down parser of parser1.c
    to correctly deal with the four basic arithmetic operations.
 (scanner.h/cdo not change, so no need to submit
    them)
- (bonus problem) Add more operations to the top-down parser, and/or deal
    with parentheses.
 (If you solve this problem, also submit thescanner.h/cfiles, but only oneparser.cfile for both problems.)
- Bring your notebook computer to the next lecture. Check again that
    flex,bison,make, andgccare installed.
 
Glossary
  - ambiguous grammar
- 曖昧な文法
- recursive descent parsing
- 再帰的下向き構文解析
- depth-first
- 深さ優先
- lookahead
- 先読み
- backtracking
- バックトラック
- labyrinth
- 迷路
- right associative
- 右結合
- invariant
- 不変条件
- left recursion
- 左再帰