Dictionaries and their Implementation:
Binary Trees
(辞書とその実装: 二分木など)
Data Structures and Algorithms
8th lecture, November 8, 2018
http://www.sw.it.aoyama.ac.jp/2018/DA/lecture8.html
Martin J. Dürst
© 2009-18 Martin
J. Dürst 青山学院大学
Today's Schedule
- Leftovers and summary of last lecture
- Sorting algorithms faster than O(n log
n)
- The dictionary ADT
- Binary trees and their traversal methods
- Binary search trees
- Balanced trees
Leftovers of Last Lecture
Summary of Last Lecture
- Quicksort is a very efficient algorithm for sorting
- In the worst case, quicksort is O(n2);
on average, O(n log n)
- Quicksort is a good example for the use of average time
complexity and randomized algorithms
- Implementing quicksort requires careful attention to many details
- Animation of many sorting algorithms: sort.svg
- Sorting based on pairwise comparison is Ω(n log
n)
Sorting Faster than O(n log n)
- All sorting algorithms studied so far assume an arbitrary distribution of
values
- Decisions are made by binary comparisons of values
→depth of decision tree is at least Ω(n log
n)
- If there is some knowledge about the value distribution, improvements are
possible
- Extreme example: Integers from 1 to n
→Final place of data can be predicted exactly
→O(n)
- Radix sort, bin sort
Bin Sort
(also called bucket sort)
Example: Sorting by student number
- Separate data into 10 parts using most significant digit
- Apply recursively to less significant digits
- To manage memory, split separation into two phases
(one_digit_stable_sort
in 8binradix.rb)
- Calculate size of each part
- Move data items
- Complexity is O(n
k), where k is the number of digits
- Implementation in Ruby:
conceptual_bin_sort
in 8binradix.rb
Radix Sort
- Sort once for each digit, starting with the least significant
digit
- No need to partition data
- A stable sorting method is necessary
- Complexity is O(n
k), where k is the number of digits
- Implementation in Ruby:
radix_sort
8binradix.rb
Bin Sort vs. Radix Sort
|
Bin Sort |
Radix Sort |
Complexity |
O(n k) |
O(n k) |
First digit sorted |
most significant digit |
least significant digit |
Last digit sorted |
least significant digit |
most significant digit |
Direction |
→ |
← |
Stable sort needed |
No |
Yes |
Data partitioning needed |
Yes |
No |
Parallelization of Sorting
- Recently, computers do not get faster, but smaller and larger in
numbers
- Parallelization becomes important!
- For some tasks, parallelization can be very difficult
- For some tasks, parallelization can be quite easy
- Many sorting algorithms are eazy to parallelize:
- Bubble sort
- Merge sort
- Quick sort
The Dictionary ADT
(caution: Not the same as a (book) dictionary)
- For each data item, there is:
- A key: Used to identify the data item, e.g. during
search
- A value: All the information besides the key (may be
empty)
- Operations
- Search/find
- Insert
- Delete
Simple Dictionary Implementations
- Sorted array: Search is O(log
n) (binary search), insertion/deletion is O(n)
- Unordered array/linear list: Search is O(n)
- Ideally, search/insertion/deletion should all be O(log n) or even O(1)
- Binary search tree (this week)
- Balanced tree (next week)
- Hashing (in two weeks)
Binary Tree
- Graph: Consisting of nodes and edges
- Tree: The root does not have any parent; all other
nodes have exactly one parent
- Binary tree: Each node has ≦2 children
Traversal Methods for Binary Trees
- Depth first
- Preorder
- Inorder
- Postorder
- Breadth first
Binary Search Tree: Invariants
- Binary tree
- Each node contains one data item
- For any node with key k:
- All the keys in the left subtree will be
<k (or
≦k)
- All the keys in the right subtree will be
>k (or
≧k)
- What to do with multiple identical keys is
implementation-dependent
Search in a Search Tree
- Start searching from the root node
- If the search key, compared to the current node
- Is the same: Return the data item at the current node
- Is smaller: Search in left subtree (recursion)
- Is greater: Search the right subtree (recursion)
- Is the empty node: Terminate seach (not found!)
Insertion into a Search Tree
- Start insertion from the root node
- If the inserted key, compared to the current node
- Is smaller: Insert item into left subtree (recursion)
- Is greater: Insert item into right subtree (recursion)
- Is the same: Give up/insert into right subtree, ... (implementation
dependent)
- If the current node is empty: Insert item here as a new node (with two
empty nodes as children)
Deletion from a Search Tree
- Find the node to delete (same as search)
- If the number of (real, non-empty) children is
- 0: Delete current node (replace with empty node)
- 1: Replace current node with child
- 2: Replace current node with smallest child in the right subtree
(or largest node in left subtree)
Implementation of Search Tree
- Share a single special node (
NilNode
) everywhere there is no
child node
- Pseudocode/implementation in Ruby: 8bintree.rb
Evaluation of Simple Search Tree
- Execution speed depends on the height of the tree
- Best height is O(log
n)
- Worst height is O(n)
- Average height is O(log
n)
(assuming that all input orders have the same probability)
Balanced Trees
- In the worst case, the shape of a general search tree is the same as the
shape of a linear list
- Because the order of insertions/deletions cannot be changed,
using randomization (as used for quicksort) to select the dividing item is
impossible
- In the case of a complete binary tree, insertion/deletion take too much
time
Solution: A tree that is to some degree (but not perfectly) balanced
Top-down 2-3-4 Tree
(definition/invariants)
- The number of children for each node is 2, 3, or 4
- If a node has k children, it stores k-1 keys and
data items
(if the number of children is 2, then this is the same as for a binary
search tree)
- The keys stored in a node are the separators for the subtrees
- The tree is of uniform height
- In the lowest layer of the tree, the nodes have no children
Summary
- Bin sort and radix sort are O(n
k)
- A dictionary is an ADT storing values that can be found
using keys
- A binary search tree is a way to implement a dictionary
- Operations on binary search trees are O(n log
n) on average, but O(n) in the worst case
- This problem can be addressed using balanced trees
Homework
(no need to submit)
- Calculate the minimum and maximum height of a binary search tree with
n data items
- Calculate the minimum and maximum height of a 2-3-4 tree with
n data items
- Using various examples, think about how to insert items into a 2-3-4 tree
and propose an algorithm
Glossary
- bin sort
- ビンソート
- most significant digit
- 最上位の桁
- radix sort
- 基数整列
- least significant digit
- 最下位の桁
- balanced tree
- 平衡木
- traversal method
- 辿り方
- binary search tree
- 二分探索木
- balanced tree
- 平衡木
- hashing
- ハッシュ法
- binary tree
- 二分木
- parallelization
- 並列化
- depth first
- 深さ優先
- preorder
- 行きがけ順
- inorder
- 通りがけ順
- postorder
- 帰りがけ順
- breadth first
- 幅優先
- key
- キー、鍵
- implementation-dependent
- 実装依存
- top-down 2-3-4 tree
- トップダウン 2-3-4 木
- uniform height
- 一定の高さ