Course Overview
Overall Compiler Structure
(授業の概要;
コンパイラ全体の仕組み)
Language Theory and Compilers
(言語理論とコンパイラ)
1st lecture, April 8, 2022 / on demand
https://www.sw.it.aoyama.ac.jp/2022/Compiler/lecture1.html
© 2005-22 Martin
J. Dürst 青山学院大学
Today's Schedule
- Self-introduction
- About this course
- Difficulty of processing input
- Compiler structure
- Lexical analysis and parsing
- Example of an automaton
- Example of a formal language grammar
Self-Introduction
TA: YU JINSONG (于 津松、M1)
Today's Schedule
- Self-introduction
- About this course
- Difficulty of processing input
- Compiler structure
- Lexical analysis and parsing
- Example of an automaton
- Example of a formal language grammar
授業の位置付け
- 3 年前期
- 第二科目群 (選択必修、EJ 科目)
- ベースとなる科目:
- 計算機概論
- 情報数学
- 計算機実習 (プログラミング)
授業の目標
- (言語) 理論と応用の関係の理解
- ツールによる応用
- コンパイラ
- 文書処理
- 入力の解析
- データ形式の設計
- 小さい言語の設計と実装
(ドメイン特化言語など)
The Importance of Compilers
Blog by
Steve Yegge
Summary:
- If you don't know how compilers work, then you don't know how computers
work.
- If you're not 100% sure whether you know how compilers work, then you
don't know how they work.
ACM Turing Award
On March 30, 2021, ACM annouced that the 2020 Turing Award ("Nobel Prize for Computer
Science")
was awarded to Alfred V. Aho and Jeffrey D. Ullman
for their work on programming language implementation and their highly
influential books.
授業の進め方
成績評価方法
(目安)
- 授業中のミニテスト: 20%
- 演習課題・レポート: 35%
- 期末試験: 45%
他人との協力
宿題・レポートなどの場合:
- 各自独自で取り組む
- 質疑などはできる限り Moodle へ
- 知り合いの間、相談してもよいが、解答やその一部の交換は禁止
これらのルールを守らない者に対し、提出物の一部や全部を0点にすることになる!
Course Schedule
Schedule
(https://www.sw.it.aoyama.ac.jp/2022/Compiler)
Books/References
(https://www.sw.it.aoyama.ac.jp/2022/Compiler/biblio.html)
(授業は言語理論とコンパイラ両方をカバーするが、参考書はそれぞれ片方に集中)
Course Contents
|
Theory |
Compilers |
Other applications |
Front end |
language theory, automata (2, 3, 6, 12) |
lexical analysis (4, 5), parsing (7-10) |
regular expressions, text/data formats (4) |
Back end |
|
optimization, code generation (13, 14) |
|
(numbers indicate numbers of lectures where topic is discussed)
- Concentrate on input, not output
- Use various representations from theory and applications
Today's Schedule
- Self-introduction
- About this course
- Difficulty of processing input
- Compiler structure
- Lexical analysis and parsing
- Example of an automaton
- Example of a formal language grammar
Example of Difference
between Input and Output
Character itself
(internal representation) |
HTML/XML Escaping
(external representation) |
' |
' |
" |
" |
< |
< |
> |
> |
& |
& |
Which direction is more difficult?
Input: HTML escaping → characters
("AT&T, 3<5"
→ "AT&T,
3<5"
)
Output: Characters → HTML escaping
("AT&T, 3<5"
→ "AT&T,
3<5"
)
Difficulties for Input
- Not structured
(just a sequence of bytes/characters)
- Anything goes (including errors)
- Deciding whether some input is correct or not
is a model for computation in general
- Deciding whether input is correct
is equivalent to recognition
Today's Schedule
- Self-introduction
- About this course
- Difficulty of processing input
- Compiler structure
- Lexical analysis and parsing
- Example of an automaton
- Example of a formal language grammar
The Function of a Compiler
Bridge between software and hardware
- Input: Program that can be understood
by humans
- Language: High-level program language
- Medium: source (file/program)
- Output: Program that can be executed
by a machine
- Language: assembly language,
machine language
- Medium: object code, machine code
Example Compiler Input/Output
Input fragment:
sum += price * 25;
Output (assembly language):
LOAD R1, price ; load from price into R1 (register 1)
CONST R2, 25 ; put constant 25 into R2 (register 2)
MUL R1, R1, R2 ; put multiple of R1 and R2 into R1
LOAD R2, sum ; load from sum into R2
ADD R2, R1, R2 ; put the sum of R1 and R2 into R2
STORE sum, R2 ; store the contents of R2 into sum
Logical Structure of a Compiler
- [preprocessor]
- Lexical analysis
- Parsing (syntax analysis)
- Semantic analysis
- Optimization (or 5.)
- Code generation (or 4.)
- [assembler]
- [linker, loader]
Compiler Types and
Related Software
- One-pass compiler
- X-pass compiler (x between 1 and 70 (IBM's PL/1 compiler in the
1970ies))
- Cross-compiler (e.g. compiling on PC for smartphone)
- Dynamic/just-in-time (JIT) compiler
- Preprocessor (runs before the compiler)
- Interpreter (e.g. for Ruby, Python, Perl,...)
Today's Schedule
- Self-introduction
- About this course
- Difficulty of processing input
- Compiler structure
- Lexical analysis and parsing
- Example of an automaton
- Example of a formal language grammar
Example of Lexical Analysis
Fragment of input program:
sum += price * 25;
This is a sequence of characters:
s |
u |
m |
|
+ |
= |
|
p |
r |
i |
c |
e |
|
* |
|
2 |
5 |
; |
\n |
Corresponding output (sequence of tokens):
id("sum"), plusequal, id("price"), asterisk, int(25), semicolon
Overview of Lexical Analysis
- Convert a sequence of characters
to a sequence of tokens
- Several characters become one token
(e.g. s u m
→ id("sum"))
- Similar to finding words in a natural language text
Lexical Analysis Details
- Tokens have types
Examples: id, int, asterisk, semicolon
- Tokens can have attributes
Examples: id (identifier, with name),
int (integer, with value)
- Many token names are derived
from the symbol shape
(example: * is asterisk, not mult,
because it can also be used for pointers)
- White space is used during lexical analysis,
but then discarded
Example of Parsing
Program fragment: sum += price * 25;
Input (sequence of tokens):
id("sum"), plusequal, id("price"), asterisk, int(25), semicolon
Corresponding output (syntax tree):
Details of Syntax Trees
- Tokens become nodes in the syntax tree
- For expressions:
- Operators become internal nodes
(and represent the operation and its result)
- Operands (including subexpressions) become children
- The structure of the tree shows
priority and associativity of operations
(operations at the bottom are evaluated first)
- Abstract constructs (e.g. statement,...)
are added as parents
- Parentheses and other separators (
,
, ;
,...)
are used during parsing, but then discarded
More Examples
price = pretax / 100 * (108 - discount);
score = theory * 2 - errors / 2 + practice * 3;
if (a > 5)
b = 15;
Today's Schedule
- Self-introduction
- About this course
- Difficulty of processing input
- Compiler structure
- Lexical analysis and parsing
- Example of an automaton
- Example of a formal language grammar
Example of an Automaton
Very simple automatic vending machine:
- Input: 50 Yen coins
- Output: Water bottle (price: 150 Yen)
State transition diagram:
Today's Schedule
- Self-introduction
- About this course
- Difficulty of processing input
- Compiler structure
- Lexical analysis and parsing
- Example of an automaton
- Example of a formal language grammar
Example Language: Commands
- Example Commands:
- Eat bread!
- Read books!
- Play Music!
- Stay Home!
- Structure of a command:
Verb Noun '!'
Grammar of Commands
Command → Verb Noun '!'
Verb → eat
Verb → read
Verb → play | stay
Noun → bread
Noun → music | books | home
How to produce a command:
Start with Command, and use grammar rules to replace concepts with words
Language and Grammar
- Language is defined on two layers:
- Structure (syntax)
- Meaning (semantics)
- Grammar defines (restricts) the structure of a language
- The grammar allows sentences such as
"stay bread" or "read home",
but semantically, they do not make sense
- Both natural languages and programming languages have two layers
Homework Submission / 宿題提出
Deadline: April 14, 2022 (Thursday), 18:40
Where to submit: Box in front of room O-529 (building O, 5th floor)
Format: A4 single page (using both sides is okay; NO cover page), easily
readable handwriting (NO printouts), name (kanji and kana) and student number
at the top right
Problem: For the one-line C program fragment below, based on the examples
given in this lecture, write down:
- the result of lexical analysis
- the result of parsing
- the output of the compiler (in assembly language; comments are not
needed; use
SUB
for substraction, and DIV
for
division)
grade = english - absent * 5 + math / 3;
Schedule From Now On
April 14 (Thursday), 18:40: Homework deadline, box in front of O-529
April 15 (Friday), 11:00-12:30: Second lecture, face-to-face, E-202
Glossary
- lexical analysis
- 字句解析
- parsing, syntax analysis
- 構文解析
- automaton
- オートマトン
- formal language
- 形式言語
- grammar
- 文法
- executive summary
- 役員 (時間がない人) のための要約
- front end
- フロントエンド
- back end
- バックエンド
- optimization
- 最適化
- code generation
- コード生成
- regular expression
- 正規表現
- text format
- 文書形式
- data format
- データ形式
- internal representation
- 内部表現
- external representation
- 外部表現
- (e.g. face) recognition
- (顔) 認識
- high-level program language
- 高級プログラム言語
- source (file/program)
- ソース (ファイル・プログラム)、原始プログラム
- object code
- 目的プログラム
- machine code
- 実行プログラム
- assembly language
- アセンブリ言語
- register
- レジスタ
- preprocessor
- プリプロセッサ
- semantic analysis
- 意味解析
- assembler
- アセンブラ (アセンブリ言語を処理するソフト)
- linker
- リンカ
- loader
- ローダ
- one pass compiler
- ワンパス・コンパイラ
- x-pass compiler
- x-パス・コンパイラ
- cross-compiler
- クロスコンパイラ
- dynamic/just-in-time (JIT) compiler
- 動的コンパイラ
- preprocessor
- プリプロセッサ
- interpreter
- インタプリタ、通訳系
- token
- トークン、記号、符
- natural language
- 自然言語
- attribute
- 属性
- identifier (発音: アイデンティファイア)
- 識別子
- syntax tree
- 構文木
- operator
- 演算子
- operand
- 被演算子
- expression
- 式
- subexpression
- 部分式
- statement (of a program)
- 文
- separators
- 区切り記号
- automatic vending machine
- 自動販売機
- state transition diagram
- 状態遷移図
- structure
- 構造
- syntax
- 構文
- semantics
- 意味 (論)
- command
- 命令 (文)
- verb
- 動詞
- noun
- 名詞