Executing Environment: Garbage
Collection, Virtual Machines
(実行環境、ゴミ集め、仮想計算機、動的コンパイル)
15th lecture, July 26, 2019
Language Theory and Compilers
https://www.sw.it.aoyama.ac.jp/2019/Compiler/lecture15.html
Martin J. Dürst
© 2005-19 Martin
J. Dürst 青山学院大学
Today's Schedule
- Remaining schedule
- Summary of last time
- Executing environment
- Garbage collection
- Virtual machines
- Dynamic compilation
Remaining Schedule
- July 26: Executing environment: virtual machines, garbage
collection,...
- August 2: 11:10-12:35: Term final exam
Logical Structure of a Compiler
- [preprocessor]
- Lexical analysis
- Parsing (syntax analysis)
- Semantic analysis
- Code generation (or 5)
- Optimization (or 4)
- [assembler]
- [linker, loader]
Relocatable Program
Checking Relocatable Programs
- On Unix/Linux/cygwin, use the
nm
(name) command
(install if not available)
- Example:
nm relocatable.o >relocatable.nm
- Meaning of main symbol types:
- lower case: Local to file
- UPPER case: Global across files
- B/b: BSS: Uninitialized data (uninitialized global variables)
- D/d: Data: Initialized data
- R/r: Read only data
- T/t: Text: Code (program "text")
- U: Undefined
- Can also be used on executables
Execution Environment
Everything that has to be provided together with the compiler:
- Processing of command-line arguments
- Processing of environment variables
- Exception processing
- Functions/libraries for input/output,...
- Memory management
Traditional Dynamic Memory Management
- Dynamic memory in C:
- Obtain "raw" memory with
malloc
- Give memory back with
free
- Dynamic memory in C++:
- Create a new object with
new
(includes size calculation and memory allocation)
- Finalize an object with
delete
(includes freeing memory)
- Problem:
- Locations for allocation and freeing differ
- Memory may be given back too early, too late, or never (memory
leak)
- Solution:
Garbage Collection (GC)
- Manual dynamic memory management is tedious and error-prone
- Allocating memory when creating new objects is easy
(
new
,...)
- Knowing when to free them is difficult
- Solution: Automatically collect unused (unusable) memory
- This is called garbage collection or GC
How Garbage Collection Works
- In programming languages that do not use raw pointers, the type of each
data item is fixed
- This allows to detect all references to other objects
- The memory of an object that is not referenced is not useable
⇒ It can be collected and reused
- GC demo: garbage.rb (use the task
manager to observe memory usage during execution)
Advantages and Disadvantages of GC
- Advantages:
- Programmers do not have to think about memory management
- Programming efficiency increases
- Memory leaks are (mostly) avoided
- Disadvantages:
- Needs time
- May not run in parallel to other operations
⇒ program may seem to stop for a while
- Currently, most programming languages (except C, C++, ObjectiveC,...) use
GC
Methods for GC
- Reference count(ing) GC
- Mark and sweep GC
- Copying GC
- Generational GC
- Combinations of the above methods,...
Reference Count(ing) GC
- For each object, count the number of times it is referenced (reference
count)
- If a reference is copied (assigned), increment the count
- If a reference is overwritten, decrement the count
- If the count becomes 0, the memory can be reused
- Advantage: Distribution of overhead (no long pauses for GC)
- Disadvantages:
- Overhead during normal operation
- Objects with cyclic references cannot be garbage collected
Mark and Sweep GC
(mark and sweep GC)
- Add a flag to each object
- Initially, set all flags to off (not referenced)
- Recursively follow references from static memory (globals and stack),
and set the flags of all reached objects to on
- Collect the memory of all objects where the flag is
off
- Advantage: Cyclic references correctly collected
- Problem: Long pauses in execution
Copying GC
(usually used together with mark and sweep)
- Prepare two areas for dynamic memory
- Use one area for program execution
- Recursively follow references and copy objects to second area
(references need to be rewritten)
- Advantage: Compaction of dynamic memory
- Less/no memory fragmentation
- More locality of access → faster
- Problems: More memory needed, long pauses during execution
Generational GC
- Some data lives long, others only a short time
- Split dynamic memory into two or more generations
- Garbage-collect young(er) generation(s) often
- Promote objects in young(er) generation(s) after they survive GC a
certain number of times
- Garbage-collect old(er) generation(s) only rarely
- Problem: References from older to newer generations need special care
(write barriers)
Virtual Machine
- Goal: Increase portability, simplify compiler implementation
- Mechanism:
- Compiler produces code for a machine that does not exist
physically
- This code is interpreted and executed by a program (emulator)
- Examples: Pascal, JVM: Java
Virtual Machine, Ruby 1.9~ (YARV), ...
MacIntosh: Emulating Motorola 680x0 (1984-95) on IBM PowerPC, emulating
PowerPC (1995-2006) on Intel i386 (2006-)
- Problem: Slower than machine code (in general, between 3 and 10 times
slower)
- Similar: cygwin, wine, VMWare, VirtualBox,...
(only the OS and libraries, not the machine itself is emulated)
Dynamic Compilation
- Goal: Improve efficiency of virtual machines and emulators
- Mechanism (during program execution):
- Check execution count of functions and blocks
⇒ Compile frequently used functions/blocks to hardware code
- Check use of frequent parameter values (e.g. 0, 1) or types (for
object-oriented and dynamic languages)
⇒ Recompile to optimize frequent special cases
- Problems:
- Advanced technology needed
- Overhead may be greater than savings
- Examples: Java
HotSpot, LLVM (used by Apple,...), Chrome V8 (for JavaScript)
Summary
- In most programs, analysis of input is more difficult than production of
output
- Acceptance/rejection of input can be modelled by formal language
theory
- Formal language theory is the basis for the front-end of compilers:
Lexical analysis and parsing
- Formal language theory and knowledge about lexical analysis and parsing
is useful in many other areas
- Design and implementation of programming languages is still a very
interesting and important area of research
Glossary
- relocatable program
- 再配置可能プログラム
- environment variable
- 環境変数
- exception
- 例外
- garbage collection
- ゴミ集め
- reference count(ing) GC
- 参照カウント GC
- mark and sweep GC
- 印掃式 GC
- copying GC
- 複写式 GC
- memory compaction
- メモリ・コンパクション
- memory fragmentation
- 断片化 (フラグメンテーション)
- generational GC
- 世代別 GC
- generation
- 世代
- virtual machine
- 仮想計算機、仮想マシーン
- portability
- 移植性
- emulate
- エミュレート