Linear Grammars and Regular Expressions

(正規文法と正規表現)

4th lecture, April 29, 2022

Language Theory and Compilers

https://www.sw.it.aoyama.ac.jp/2022/Compiler/lecture4.html

Martin J. Dürst

AGU

© 2005-22 Martin J. Dürst 青山学院大学

 

Today's Schedule

 

Schedule for Next Few Weeks

 

Leftovers

 

Last Week's Homework 4

Check the versions of flex, bison, gcc, make, and m4 that you installed (no need to submit, but contact me by mail if you have a problem)

 

Last Week's Homework 1

Draw a state transition diagram for a finite state automaton that recognizes all inputs that (at the same time)

 

Last Week's Homework 2

Draw the state transition diagram for the NFA in the state transition table below

  ε 0 1
→S {P} {Q} {E}
E {} {P} {P, T}
P {} {T} {}
*Q {} {} {S, P}
T {P} {} {Q}

 

Last Week's Homework 3

Create the state transition table of the DFA that is equivalent to the NFA in homework 2 (do not rename states).

[removed]

 

Plan for this Lecture

These all are equivalent, and
define/accept regular languages

 

Example of Right Linear Grammar

有限オートマトンの状態遷移図

A → fB | gA

B → gA | fC | f

C → gA | fC | f

 

Right Linear Grammars and FSAs

Right linear grammars and NFAs correspond
as follows (not considering ε transitions):

FSA Right Linear Grammar
states nonterminal symbols
start state start symbol
transitions moving to an accepting state constant rules (e.g. Ac)
all transitions right linear rules (e.g. AgB)

There is a similar correspondence for left linear grammars (imagine reading the input backwards)

 

From NFA to Right Linear Grammar

 

From Right Linear Grammar to NFA

 

Linear Grammar Definitions

Simple Rewriting Rules
Rule Shape Name
AgB right linear rule (nonterminal on the right)
ABg left linear rule (nonterminal on the left)
Ac constant rule

Right linear grammar =
right linear rules + constant rules

Left linear grammar = left linear rules + constant rules

(in both cases, a special rule Sε is allowed)

Both left linear grammars and right linear grammars are regular grammars

(A grammar that contains both left linear rules and right linear rules is called a linear grammar. A linear grammar is not a regular grammar, but a kind of context-free grammar.)

 

Today's Outlook

Summary up to now:

Challenge: Find a more compact representation for regular languages.

 

Example of Regular Expression

To find DFA and NFA in a document,

use the regular expression (D|N)FA (also written /(D|N)FA/)

(// are the delimiters for regular expressions (in Ruby, Perl, JavaScript,...))

 

Properties of Regular Expressions

 

Theoretical Regular Expression:
Syntax

 

Examples of Regular Expressions

 

Notation of Regular Expressions

 

Formal Definition of Theoretical Regular Expressions

L(r) denotes the language defined by regular expression r.

Theoretical Regular Expressions over Alphabet Σ
Priority Regular Expression Condition Defined Language Notes

ε, a a ∈ Σ {ε} or {a} literals
very high (r) r is a regular expression L((r)) = L(r) grouping
high r* r is a regular expression L(r*) = (L(r))* Kleene closure
low rs r, s are regular expressions L(rs) = L(r)L(s) concatenation
very low r|s r, s are regular expressions L(r|s) = L(r) ∪ L(s) set union

 

Grammar for Theoretical Regular Expressions

 

Caution: Priority

Make sure you get priorities right!

Expression 1 Matches 1 Expression 2 Matches 2
ab|c ab, c a(b|c) ab, ac
abc* ab, abc, abcc,... (abc)* ε, abc, abcabc,...
a|b|c* a, b, ε, c, cc,... (a|b|c)* ε, a, b, c, aa, ab, ac, ba,...
ab|c*|d ab, ε, c, cc, ccc,..., d a(b|c)*d ad, abd, acd, abbd, abcd, acbd, accd,...

 

Regular Expression to NFA: Symbols, Concatenation

The NFA for a symbol a has two states and one arrow:

The NFA for the regular expression rs
connects the accepting state of r
with the start state s
through an ε transition.

 

Regular Expression to NFA: Alternative

The NFA for r|s is constructed from the NFAs for r and s as follows:

全体の初期状態から r と s の初期状態へと、r と s の受理状態から全体の受理状態へ ε で結ぶ

The additional ε connections are necessary to clearly commit to either r or s.

 

Regular Expression to NFA: Repetition

The NFA for r* is constructed as follows:

全体の初期状態と r の初期状態、r の受理状態と全体の受理状態、全体の初期状態と全体の受理状態、そして r の受理状態と初期状態 (逆!) を ε で結ぶ。

 

Conversion: Regular Expression to NFA

 

Example of Conversion

Regular expression: a|b*c

In some cases, some of the ε transitions may be eliminated, or the NFA may otherwise be simplified.

 

Conversion: FSA to Regular Expression

Algorithmic conversion is possible, but complicated

General procedure:

  1. Create regular expressions for getting from state A to state B directly for all pairs of states
  2. Select a single state, and create all regular expressions that pass through this intermediate state
  3. Repeat step 2., increasing the number of intermediate states
  4. Simplify intermediate regular expressions as much as possible (they can get quite complex)

When understanding what language the FSA accepts, it is often easy for humans to create a regular expression for this language.

 

Applications of Regular Expressions

 

Practical Regular Expressions:
Syntax

Practical regular expressions have many additional functions and shortcut notations
(the corresponding theoretical regular expressions or simpler constructs are given in parentheses)

 

Theoretical and Practical Regular Expressions:
Usage Differences

 

Use of Practical Regular Expressions

 

Notes on Practical Regular Expressions

 

Theoretical vs. Practical Regular Expressions

Theoretical Practical
Meta-characters * | ( ) |*+?()[]{}.\^$
ε yes no
character classes ([]) no yes
+, ?, {} quantifiers no yes
^, $ anchors no yes
match where full word part of a string
implementation NFA→DFA backtracking,...
descriptive power regular language more than regular language

 

Summary of this Lecture

 

Homework Submission

Deadline: May 2, 2022 (Monday!), 19:00

Format: A4 single page (using both sides is okay; NO cover page), easily readable handwriting (NO printouts), name (kanji and kana) and student number at the top right

Where to submit: Box in front of room O-529 (building O, 5th floor)

 

Homework

  1. Construct the state transition diagram for the NFA corresponding to the following grammar
    S → xB | yB | yC, A → xC | z | yS, B → zD | zC | xB | y, C →yA | aD | z
  2. Convert the automaton defined by the following transition table to a right linear grammar

        0         1    
    →T     G M
    *G K L
    H M G
    K H -
    *L M T
    M L G
  3. Construct the state transition diagram for the regular expression rp|h*s
    Write down two versions:
    1. The result of the full procedure (with all ε transitions)
    2. A version that is as simple as possible
  4. Bring your notebook PC with you next week (May 6).
    Make sure you can use flex, bison, gcc, make, diff, and m4 (no need to submit)

 

Glossary

regular expression
正規表現
minimization
最小化
partition
分割
isomorphic
同型 (同形) の
(left/right) linear rule
(左・右) 線形規則
constant rule
定数規則
(left/right) linear grammar
(左・右) 線形文法
delimiter
区切り文字
alternative
選択肢
repetition
繰返し
meta-character
メタ文字
priority
優先度
theoretical regular expressions
論理的 (な) 正規表現
practical regular expressions
実用的 (な) 正規表現
first-class object
第一級オブジェクト、ファーストクラスオブジェクト
notation(al)
表記 (上の)
arbitrary
任意
leftmost
できるだけ左