Dictionaries and their Implementation:
Binary Trees, ...
(辞書とその実装: 二分木など)
Data Structures and Algorithms
8th lecture, November 5, 2015
http://www.sw.it.aoyama.ac.jp/2015/DA/lecture8.html
Martin J. Dürst
© 2009-15 Martin
J. Dürst 青山学院大学
Today's Schedule
- Summary of last lecture, homework
- Sorting algorithms faster than O(n log
n)
- The dictionary ADT
- Binary trees and their traversal methods
- Binary search tree
- Balanced tree
Summary of Last Lecture
- Quicksort is a very efficiont algorithm for sorting, and a good example
to learn about algorithms and their implementation
- In the worst case, quicksort is O(n2);
on average, O(n log n)
- Quicksort is a good example for the use of average time complexity and
randomized algorithms
- Implementing quicksort requires attention to many details
- Animation of many sorting algorithms: sort.svg
- Sorting based on pairwise comparison is Θ(n log
n)
Report: Manual Sorting
Deadline: November 4th, 2015 (Wednesday), 19:00.
Problem: Propose and describe an algorithm/algorithms for
manual sorting, for the following two cases:
- One person sorts 6000 pages
- 20 people together sort 60000 pages
Each page is a sheet of paper of size A4, where a 10-digit number is printed
in big letters.
The goal is to sort the pages by increasing number. There is no knowledge
about how the numbers are distributed.
You can use the same algorithm for both cases, or a different algorithm.
Details:
- Describe the algorithm(s) in detail, so that e.g. your friends who don't
understand computers can execute them.
- Describe the equipment/space that you need.
- Calculate the overall time needed for each case.
- Analyse the time complexity (O()) of the algorithm(s).
- Comment on the relationship to other algorithms you know, and on the
special needs of manual (as opposed to computer) execution.
- If you use any Web pages, books, ..., list them as references at the end
of your report
Caution: Use IRIs (e.g. http://ja.wikipedia.org/wiki/情報), not URLs
(e.g. http://ja.wikipedia.org/wiki/%E6%83%85%E5%A0%B1)
Problems Seen in Reports
- 218341.368 seconds (⇒about 61 hours)
- 61010·103·1010 (units? way too big)
- O(60000) (how many seconds could this
be)
- Calulation of actual time backwards from big-O notation
(1second/operation, n=6000,
O(n2) ⇒ 3600000 seconds?)
- A O(n) algorithm (example: "5
seconds per page")
- For 20 people, having only one person work at the end of the
algorithm
- For humans, binary sorting is constraining (sorting
into 3~10 parts is better)
- Using bubble sort (868 days without including breaks
or sleep)
- Preapare 1010 boxes (problem: space,
cost, distance for walking)
- Forgetting time for preparation, cleanup, breaks,...
- Submitting just a program
- Report too short
Sorting Faster than O(n log n)
- All sorting algorithms studied so far assume an arbitrary distribution of
values
- Decisions are made by comparisons of values;
the depth of the decision tree is at least O(n log
n)
- If there is some knowledge about the value distribution, improvements are
possible
- Extreme example: Integers from 1 to n
→Final place of data can be predicted exactly
→O(n)
- Radix sort、
Bin sort
Bin Sort
(also called bucket sort)
Example: Sorting by student number
- Separate data into 10 parts using most significant digit
- Apply recursively to less significant digits
- To manage memory, split separation into two phases
(one_digit_stable_sort
in 8binradix.rb)
- Counting number of items in each part
- Moving data items
- Complexity is O(n
k), where k is the number of digits
- Implementation in Ruby:
bin_sort
in 8binradix.rb
Radix Sort
- Sort once for each digit, starting with the least significant digit
- No need to partition data
- A stable sorting method is necessary
- Complexity is O(n
k), where k is the number of digits
- Implementation in Ruby:
radix_sort
8binradix.rb
Parallelization of Sorting
- These days, computers do not get faster, but smaller and larger in
numbers
- Parallelization becomes important!
- For some tasks, parallelization can be very difficult
- For some tasks, parallelization can be quite easy
- Many sorting algorithms are eazy to parallelize:
- Bubble sort
- Merge sort
- Quick sort
The Dictionary ADT
(caution: Not exactly the same as a (book) dictionary)
- For each data item, there is:
- A key: Used to identify the data item, e.g. during
search
- A value: All the information besides the key (may be
empty)
- Operations
- Insert
- Delete
- Search/find
Simple Dictionary Implementations
- Sorted array: Search is O(log
n) (binary search), insertion/deletion is O(n)
- Unordered array/linear list: Search is O(n)
- Ideally, search/insertion/deletion should all be O(log n)screen
- Binary search tree
- Balanced tree
- Hashing
Binary Tree
- Graph: Consisting of nodes and edges
- Tree: The root does not have any parent; all other nodes have exactly one
parent
- Binary tree: Each node has ≦2 children
Traversal Methods for Binary Trees
- Depth first
- Preorder
- Inorder
- Postorder
- Breadth first
Binary Search Tree
(definition/invariants)
- Each node contains one data item
- For any node with key k:
- All the keys in the left subtree will be
<k (or
≦k)
- All the keys in the right subtree will be
>k (or
≧k)
- What to do with multiple identical keys is
implementation-dependent
Search in a Search Tree
- Start searching from the root node
- If the current node, compared to the search key, is
- The same: Return the data item at the current node
- Greater: Search in left subtree
- Smaller: Search the right subtree
- The empty node: Give up seaching
Insertion into a Search Tree
- Start insertion from the root node
- If the current node, compared to the inserted key, is
- Greater: Insert item into left subtree
- Smaller: Insert item into right subtree
- Empty: Insert item here as a new node (with two empty nodes as
children)
- The same: Give up/insert into right subtree, ... (implementation
dependent)
Deletion from a Search Tree
- Find the node to delete (using search)
- If the number of (real, non-empty) children is
- 0: Delete current node (and replace with empty node)
- 1: Replace current node with child
- 2: Replace current node with smallest child in the right subtree
(or largest node in left subtree)
Implementation of Search Tree
- Share a single special node (
NilNode
) everywhere there is no
child node
- Pseudocode/implementation in Ruby: 8bintree.rb
Evaluation of Simple Search Tree
- The execution speed depends on the height of the tree
- The best height is O(log
n)
- The worst height is O(n)
- The average height is O(log
n)
(assuming that all input orders have the same probability)
Balanced Trees
- In the worst case, the shape of a general search tree is the same as the
shape of a linear list
- Because the order of insertions/deletions cannot be changed,
using randomization (as used for quicksort) to select the dividing item is
impossible
- In the case of a complete binary tree, insertion/deletion take too much
time
Solution: A tree that is to some degree (but not perfectly) balanced
Top-down 2-3-4 Tree
- The number of children for each node is 2, 3, or 4
- If a node has k children, it stores k-1 keys and
data items
(if the number of children is 2, then this is the same as for a binary
search tree)
- The keys stored in a node are the separators for the subtrees
- The tree is of uniform height
- In the lowest layer of the tree, the nodes have no children
Summary
Homework
(no need to submit)
- Calculate the minimum and maximum height of a binary search tree with
n data items
- Using various examples, think about how to insert items into a 2-3-4 tree
and propose an algorithm
Glossary
- bin sort
- ビンソート
- most significant digit
- 最上位の桁
- radix sort
- 基数整列
- balanced tree
- 平衡木
- traversal method
- 辿り方
- binary search tree
- 二分探索木
- balanced tree
- 平衡木
- hashing
- ハッシュ法
- binary tree
- 二分木
- parallelization
- 並列化
- depth first
- 深さ優先
- preorder
- 行きがけ順
- inorder
- 通りがけ順
- postorder
- 帰りがけ順
- breadth first
- 幅優先
- key
- キー、鍵
- implementation-dependent
- 実装依存
- top-down 2-3-4 tree
- トップダウン 2-3-4 木