Importance, Definition, and Classification of Formal Languages

(形式言語の重要性、定義、分類)

2rd lecture, April 14, 2017

Language Theory and Compiler

http://www.sw.it.aoyama.ac.jp/2017/Compiler/lecture2.html

Martin J. Dürst

AGU

© 2006-17 Martin J. Dürst 青山学院大学

Today's Schedule

 

Example Answers for Homework

 

Course Contens


Theory Compilers Other applications
Front end language theory, automata lexical analysis, parsing regular expressions, text/data formats
Back end
optimization, code generation

 

Importance of Formal Language Theory

 

Terms used for Natural Languages and Formal Languages

Field Smallest Unit Sequence Set Classification
natural language Japanese (単) 文、文書 (自然)言語

(大)語族、語族、語派、語群

English word sentence, text (natural) language

language macrofamily, family, group,...

formal language Japanese 記号 (文字など) (形式)言語 言語 (族)
English symbol (letter,...) word (formal) language language type,...

 

Basic Terms

Terms for formal languages:

 

Definition of Word

 

Concatenation Operation for Words

 

Properties of Concatenation

 

Definition of Language

A language over Σ is a set of words over Σ

Examples for lanuages over Σ ={a,b,c}:

 

More Examples of Languages

Operations on Languages

Operations on languages are combinations of operations on sets and operations on words.

  1. Set union of languages
  2. Set intersection of languages
  3. Set difference of langugages
  4. Concatenation operation for languages: For languages A and B, their concatenation AB is the set { wv | wA, vB }

    As for words, we write L2 for LL,...

  5. Kleene closure: Concatenating the same language 0 or more times

    written L*; L* = ⋃i=0 Li

    Example: L = {a, b} => L* = {ε, a, b, aa, ab, ba, bb, aaa, ...}

 

Main Problems in Formal Language Theory

 

Languages and Automata and Grammars

 

Table of Formal Language Types

(Chomsky hierarchy)

文法 grammar Type Lanugage type automaton
句構造文法 phrase structure grammar (psg) 0 phrase structure language Turing machine
文脈依存文法 context-sensitive grammar (csg) 1 context-sensitive language linear-bounded automaton
文脈自由文法 context-free grammar (cfg) 2 context-free language push-down automaton
正規文法 regular grammar (rg) 3 regular language finite state automaton

 

Types of Automata

Automata types are distinguised by the restrictions on their "external memory":

0. The external memory is a tape of unlimited length: Turing machine

1. The external memory is a tape of limited length: linear-bounded automaton

2. The external memory is a stack where only the top can be accessed: push-down automaton

3. There is no external memory: finite state automaton

 

Example of a Grammar for a Formal Language

Example of derivation of a word from the grammar:

Sa S oa a S o oa a A o oa a y a o o

Sa a y a o o

(single steps in a derivation are written with →, the overall result with ⇒)

 

Definition of Grammar

A grammar is defined as a quadruple (N, Σ, P, S)

 

Rewriting Rule

(also: production rule)

 

Derivation

(derivation)

Example of Grammar and Derivation

Grammar:

  1. Saba
  2. SaDTa
  3. TCDTa
  4. TCDa
  5. DCCD
  6. aCaa
  7. Daba
  8. Dbbb

Example of derivation:
S2 aDTa4 aDCDaa5 aCDDaa7 aCDbaa8 aCbbaa6 aabbaa

(numbers indicate the rewriting rule that is applied, the underlined parts indicate where the rules are applied; not necessary (e.g. for homework))

 

Types of Grammars

Grammar types are distinguished by restrictions on rewriting rules:

0. No restrictions: Phrase structure grammar, (Chomsky) type 0 grammar

1. αAβαγβ, where α and β are sequences of 0 or more (non)terminals, and γ is a sequence of 1 or more (non)terminals:
Context-sensitive grammar), (Chomsky) type 1 grammar

2. Aγ, where γ is a sequence of 1 or more (non)terminals:
Context-free grammar, (Chomsky) type 2 grammar

3. AaB or Aa (alternative: ABa or Aa):
Regular grammar, (Chomsky) type 3 grammar

(for all types, Sε is also allowed)

 

Homework

Deadline: April 20, 2017 (Thursday), 19:00

Where to submit: Box in front of room O-529 (building O, 5th floor)

Format: A4 single page (using both sides is okay; NO cover page), easily readable handwriting (NO printouts), name (kanji and kana) and student number at the top right

  1. For the language L = { a, cb, ac }, list up the 10 shortest words of L*
    Additional problem (solution voluntary): List all words of L* of length 4
  2. Using the grammar from the slide "Example of Grammar and Derivation", find 3 words (different from each other and from aabbaa). Give the full derivation for each word (rule numbers and underlines not needed). Guess and explain what language this grammar defines (Hint: If your guess is not simple, maybe you have made a mistake in the derivations).
    Additional problem (solution voluntary): Prove your guess
  3. (no need to submit, but bring your notebook PC with you to the next lecture if you have any problems)
    Install cygwin on your notebook computer (detailled instructions with images). Make sure that you select/install gcc, flex, bison, diff, make, and m4. If you have an earlier cygwin installation, make sure to check/update.

 

Glossary

word
derivation
導出
classification
分類
symbol
記号
empty word
空語
alphabet
アルファベット
(word/language) over Σ
Σ 上の (語・言語)
concatenation (operation)
連結 (演算)
associativity
結合性 (結合率が成立つこと)
neutral element
単位元
commutativity
可換性
prefectural government (building)
県庁
keyword
予約語
well-formed formula
整論理式
Kleene closure
クリーン閉包
rule
規則
type of language
言語族
Chomsky hierarchy
チョムスキー階層
phrase structure language
句構造言語
context-sensitive language
文脈依存言語
context-free language
文脈自由言語
regular language
正規言語
Turing machine
チューリング機械
linear-bounded automaton
線形束縛オートマトン
push-down automaton
プッシュダウンオートマトン
finite state automaton
有限オートマトン
external memory
外部メモリ
nonterminal symbol
非終端記号
upper case (letter)
大文字
lower case (letter)
小文字
terminal symbol
終端記号
rewriting rule/production rule
書き換え規則・生成規則
initial/start symbol
初期記号・開始記号
derivation
導出
quadruple
四字組
left-hand side
左辺
right-hand side
右辺
subsequence
部分列