Course Overview
Overall Compiler Structure
(授業の概要;
コンパイラ全体の仕組み)
Language Theory and Compilers
(言語理論とコンパイラ)
1st lecture, April 8, 2016
http://www.sw.it.aoyama.ac.jp/2016/Compiler/lecture1.html
Martin J. Dürst
(テュールスト
マーティン ヤコブ)
duerst@it.aoyama.ac.jp
Building O, Room 529
© 2006-16 Martin
J. Dürst 青山学院大学
Today's Schedule
- Self-introduction
- About this course
- Difficulty of processing input
- Compiler structure
- Lexical analysis and parsing
- Example of an automaton
- Example of a formal language grammar
Self-Introduction
Teaching Assistant: Kota Kariyado (苅宿 航太、M1)
授業の位置付け
- 3 年前期
- 第二科目群 (選択必修、JE 科目)
- ベースとなる科目:
- 計算機概論
- 情報数学
- 計算機実習 (プログラミング)
成績評価方法
授業中のミニテスト: 20%、演習課題: 30%、期末試験: 50%
(目安)
業の目標
- (言語) 理論と応用の関係の理解
- ツールによる応用
- コンパイラ
- 文書処理
- 入力の解析
- データ形式の設計
- 小さい言語の設計と実装
(ドメイン特化言語など)
The Importance of Compilers
Steve
Yegge のブログ
Executive Summary:
- If you don't know how compilers work, then you don't know how computers
work.
- If you're not 100% sure whether you know how compilers work, then you
don't know how they work.
授業の進め方
- 前方着席
- 資料配付とウェブ公開
- 書き込みが重要
- 一部ノートパソコンで演習
- 宿題、レポート、ミニテスト、期末試験
- 出席しないと大損
Course Schedule
日程
参考書
(授業は言語理論とコンパイラ両方をカバーするが、参考書はそれぞれ片方に集中)
Course Contents
|
Theory |
Compilers |
Other applications |
Front end |
language theory, automata |
lexical analysis, parsing |
regular expressions, text/data formats |
Back end |
|
optimization, code generation |
|
- Concentrate on input, not output
- Use various representations from theory to application
Example of Difference between Input and Output
(Computer Practice I, problems 041A and 04C1)
Character itself
(internal representation) |
Escaping in HTML/XML
(external representation) |
' |
' |
" |
" |
< |
< |
> |
> |
& |
& |
→: output
←: input
Which direction is more difficult?
Difficulties for Input
- Not structured (just a sequence of bytes/characters)
- Anything goes (including errors)
- Deciding whether some input is correct or not is a model for computation
in general
- Deciding whether input is correct is equivalent to recognition
The Function of a Compiler
Bridge between software and hardware
- Input: Program that can be understood by humans
- Language: High-level program language
- Medium: source (file/program)
- Output: Program that can be understood by a machine
- Language: assembly language, machine language
- Medium: object code、machine code
Example Compiler Input/Output
Input fragment:
sum += price * 25;
Output (assembly language):
LOAD R1, price ; load from price into R1 (register 1)
CONST R2, 25 ; set constant 25 into R2 (register 2)
MUL R1, R1, R2 ; set the multiple of R1 and R2 into R1
LOAD R2, sum ; load from sum into R2
ADD R2, R1, R2 ; set the sum of R1 and R2 into R2
STORE sum, R2 ; store the contents of R2 into sum
Logical Structure of a Compiler
- [Preprocessor]
- Lexical analysis
- Parsing (syntax analysis)
- Semantic analysis
- Optimization (or 5)
- Code generation (or 4)
- [assembler]
- [linker, loader]
Compiler Types and Related Software
- One pass compiler
- X-pass compiler (x between 1 and 70 (IBM's PL/1 compiler in the
1970ies)
- Cross-compiler (e.g. compiling on PC for smartphone)
- Dynamic/just-in-time (JIT) compiler
- Preprocessor (runs before the compiler)
- Interpreter (e.g. for Ruby)
Example of Lexical Analysis
Fragment of input program (sequence of characters):
s |
u |
m |
|
+ |
= |
|
p |
r |
i |
c |
e |
|
* |
|
2 |
5 |
; |
\n |
Output (sequence of tokens):
id("sum"), plusequal, id("price"), asterisk, int(25), semicolon
Overview of Lexical Analysis
- Convert a sequence of characters to a sequence of tokens
- Several characters become one token
- Similar to finding words in a natural language text
- Tokens can have attributes
Examples: id (identifier, with name)、int (integer, with value)
- Many token names are derived from the symbol shape
(example: * is asterisk, not mult, because it can also be
used for pointers)
- White space is used during lexical analysis, but then discarded
Example of Parsing
Input (sequence of tokens):
id("sum"), plusequal, id("price"), asterisk, int(25), semicolon
Output (syntax tree):
Remarks about Syntax Trees
- Each token becomes a node in the tree
- For expressions:
- Operators become parents (and represent the operation)
- Operators (and subexpressions) become children
- The structure of the tree shows priority and associativity of
operations
(operations at the bottom are evaluated first)
- Abstract constructs (e.g. statement,...) are added as parents
- Parentheses and other separators (
,
, ;
,...) are
used during lexical analysis parsing, but then
discarded
One More Example
price = pretax / 100 *
(108 - discount);
Example of an Automaton
Very simple automatic vending machine:
- Input: 50 Yen coins
- Output: Water bottle (price: 150 Yen)
state transition diagram
Language and Grammar
- Language is defined on two layers:
- Structure (syntax)
- Meaning (semantics)
- Grammar defines (restricts) the structure of a language
- Example (simple imperative sentences):
eat bread、read books、play music
- Grammar for imperative sentences:
ImperativeSentence → Verb Noun
- The grammar allows sentences such as
"read bread" or "play books",
but semantically, they are problematic
Grammar of Imperative Sentences
ImperativeSentence → Verb Noun
Verb → eat
Verb → read
Verb → play
Noun → bread
Noun → music
Noun → books
Homework / 宿題
Deadline: April 14, 2016 (Thursday), 19:00
Where to submit: Box in front of room O-529 (building O, 5th floor)
Format: A4 single page (using both sides is okay; NO cover page), easily
readable handwriting (NO printouts), name (kanji and kana) and student number
at the top right
Problem: For the one-line C program fragment below, write down the results
of lexical analysis and parsing, and the output of the compiler, based on the
examples given in this lecture.
Comments are not needed for assembly language; use SUB
for
substraction, and DIV
for division.
grade = math + english/2 - absent*4;
Glossary
- lexical analysis
- 字句解析
- parsing, syntax analysis
- 構文解析
- automaton
- オートマトン
- formal language
- 形式言語
- grammar
- 文法
- executive summary
- 役員 (時間がない人) のための要約
- front end
- フロントエンド
- back end
- バックエンド
- optimization
- 最適化
- code generation
- コード生成
- regular expression
- 正規表現
- text format
- 文書形式
- data format
- データ形式
- internal representation
- 内部表現
- external representation
- 外部表現
- (e.g. face) recognition
- (顔) 認識
- high-level program language
- 高級プログラム言語
- source (file/program)
- ソース (ファイル・プログラム)、原始プログラム
- object code
- 目的プログラム
- machine code
- 実行プログラム
- assembly language
- アセンブリ言語
- register
- レジスタ
- preprocessor
- プリプロセッサ
- semantic analysis
- 意味解析
- assembler
- アセンブラ (アセンブリ言語を処理するソフト)
- linker
- リンカ
- loader
- ローダ
- one pass compiler
- ワンパス・コンパイラ
- x-pass compiler
- x-パス・コンパイラ
- cross-compiler
- クロスコンパイラ
- dynamic/just-in-time (JIT) compiler
- 動的コンパイラ
- preprocessor
- プリプロセッサ
- interpreter
- インタプリタ、通訳系
- token
- トークン、記号、符
- natural language
- 自然言語
- attribute
- 属性
- identifier (発音: アイデンティファイア)
- 識別子
- syntax tree
- 構文木
- statement (of a program)
- 文
- separators
- 区切り記号
- automatic vending machine
- 自動販売機
- state transition diagram
- 状態遷移図
- structure
- 構造
- syntax
- 構文
- semantics
- 意味 (論)
- imperative sentence
- 命令文
- verb
- 動詞
- noun
- 名詞