Dictionaries and their Implementation:
Binary Trees, ...
(辞書とその実装: 二分木など)
Data Structures and Algorithms
8th lecture, November 17, 2016
http://www.sw.it.aoyama.ac.jp/2016/DA/lecture8.html
Martin J. Dürst
© 2009-16 Martin
J. Dürst 青山学院大学
Today's Schedule
- Summary of last lecture
- Sorting algorithms faster than O(n log
n)
- The dictionary ADT
- Binary trees and their traversal methods
- Binary search tree
- Balanced tree
Summary of Last Lecture
- Quicksort is a very efficiont algorithm for sorting, and a good example
to learn about algorithms and their implementation
- In the worst case, quicksort is O(n2);
on average, O(n log n)
- Quicksort is a good example for the use of average time complexity and
randomized algorithms
- Implementing quicksort requires attention to many details
- Animation of many sorting algorithms: sort.svg
- Sorting based on pairwise comparison is Ω(n log
n)
Sorting Faster than O(n log n)
- All sorting algorithms studied so far assume an arbitrary distribution of
values
- Decisions are made by comparisons of values;
the depth of the decision tree is at least O(n log
n)
- If there is some knowledge about the value distribution, improvements are
possible
- Extreme example: Integers from 1 to n
→Final place of data can be predicted exactly
→O(n)
- Radix sort, bin sort
Bin Sort
(also called bucket sort)
Example: Sorting by student number
- Separate data into 10 parts using most significant digit
- Apply recursively to less significant digits
- To manage memory, split separation into two phases
(one_digit_stable_sort
in 8binradix.rb)
- Counting number of items in each part
- Moving data items
- Complexity is O(n
k), where k is the number of digits
- Implementation in Ruby:
conceptual_bin_sort
in 8binradix.rb
Radix Sort
- Sort once for each digit, starting with the least significant digit
- No need to partition data
- A stable sorting method is necessary
- Complexity is O(n
k), where k is the number of digits
- Implementation in Ruby:
radix_sort
8binradix.rb
Parallelization of Sorting
- These days, computers do not get faster, but smaller and larger in
numbers
- Parallelization becomes important!
- For some tasks, parallelization can be very difficult
- For some tasks, parallelization can be quite easy
- Many sorting algorithms are eazy to parallelize:
- Bubble sort
- Merge sort
- Quick sort
The Dictionary ADT
(caution: Not exactly the same as a (book) dictionary)
- For each data item, there is:
- A key: Used to identify the data item, e.g. during
search
- A value: All the information besides the key (may be
empty)
- Operations
- Search/find
- Insert
- Delete
Simple Dictionary Implementations
- Sorted array: Search is O(log
n) (binary search), insertion/deletion is O(n)
- Unordered array/linear list: Search is O(n)
- Ideally, search/insertion/deletion should all be O(log n) or even O(n)
- Binary search tree (this week)
- Balanced tree (next week)
- Hashing (in two weeks)
Binary Tree
- Graph: Consisting of nodes and edges
- Tree: The root does not have any parent; all other nodes have exactly one
parent
- Binary tree: Each node has ≦2 children
Traversal Methods for Binary Trees
- Depth first
- Preorder
- Inorder
- Postorder
- Breadth first
Binary Search Tree
(definition/invariants)
- Each node contains one data item
- For any node with key k:
- All the keys in the left subtree will be
<k (or
≦k)
- All the keys in the right subtree will be
>k (or
≧k)
- What to do with multiple identical keys is
implementation-dependent
Search in a Search Tree
- Start searching from the root node
- If the search key, compared to the current node
- Is the same: Return the data item at the current node
- Is smaller: Search in left subtree
- Is greater: Search the right subtree
- Is the empty node: Give up seaching
Insertion into a Search Tree
- Start insertion from the root node
- If the inserted key, compared to the current node
- Is smaller: Insert item into left subtree
- Is greater: Insert item into right subtree
- Is the same: Give up/insert into right subtree, ... (implementation
dependent)
- If the current node is empty: Insert item here as a new node (with two
empty nodes as children)
Deletion from a Search Tree
- Find the node to delete (using search)
- If the number of (real, non-empty) children is
- 0: Delete current node (replace with empty node)
- 1: Replace current node with child
- 2: Replace current node with smallest child in the right subtree
(or largest node in left subtree)
Implementation of Search Tree
- Share a single special node (
NilNode
) everywhere there is no
child node
- Pseudocode/implementation in Ruby: 8bintree.rb
Evaluation of Simple Search Tree
- The execution speed depends on the height of the tree
- The best height is O(log
n)
- The worst height is O(n)
- The average height is O(log
n)
(assuming that all input orders have the same probability)
Balanced Trees
- In the worst case, the shape of a general search tree is the same as the
shape of a linear list
- Because the order of insertions/deletions cannot be changed,
using randomization (as used for quicksort) to select the dividing item is
impossible
- In the case of a complete binary tree, insertion/deletion take too much
time
Solution: A tree that is to some degree (but not perfectly) balanced
Top-down 2-3-4 Tree
- The number of children for each node is 2, 3, or 4
- If a node has k children, it stores k-1 keys and
data items
(if the number of children is 2, then this is the same as for a binary
search tree)
- The keys stored in a node are the separators for the subtrees
- The tree is of uniform height
- In the lowest layer of the tree, the nodes have no children
Summary
- Bin sort and radix sort are O(n
k)
- A dictionary is an ADT storing values that can be found
using keys
- A binary search tree is a way to implement a dictionary
- Operations on binary search trees are O(n log
n) on average, but O(n) in the worst case
- This problem can be addressed using balanced trees
Homework
(no need to submit)
- Calculate the minimum and maximum height of a binary search tree with
n data items
- Using various examples, think about how to insert items into a 2-3-4 tree
and propose an algorithm
Glossary
- bin sort
- ビンソート
- most significant digit
- 最上位の桁
- radix sort
- 基数整列
- balanced tree
- 平衡木
- traversal method
- 辿り方
- binary search tree
- 二分探索木
- balanced tree
- 平衡木
- hashing
- ハッシュ法
- binary tree
- 二分木
- parallelization
- 並列化
- depth first
- 深さ優先
- preorder
- 行きがけ順
- inorder
- 通りがけ順
- postorder
- 帰りがけ順
- breadth first
- 幅優先
- key
- キー、鍵
- implementation-dependent
- 実装依存
- top-down 2-3-4 tree
- トップダウン 2-3-4 木
- uniform height
- 一定の高さ