Heaps
(ヒープ)
Data Structures and Algorithms
5th lecture, October 15, 2015
http://www.sw.it.aoyama.ac.jp/2015/DA/lecture5.html
Martin J. Dürst
© 2009-15 Martin
J. Dürst 青山学院大学
Today's Schedule
- Summary of last lecture, homework
- Priority queue as an ADT
- Efficient implementation of priority queue
- Complete binary tree
- Heap
- Heap sort
- How to use
irb
Summary of Last Lecture
- The order (of growth)/(asymptotic) time complexity of an algorithm can be
calculated from the number of the most frequent basic operations
- Calculation can use a summation or a recurrence (relation)
- The big-O notation compactly express the inherent efficiency
of an algorithm
- An abstract data type (ADT) combines data and the operations on
this data
- Stack and queue are typical examples of ADTs
- Each ADT can be implemented in different ways
- Depending on implementation, the time complexity of each operation of an
ADT can change
Last Week's Homework 1
[昨年度資料につき削除]
Last Week's Homework 2
[昨年度資料につき削除]
Last Week's Homework 3
[昨年度資料につき削除]
Priority Queue
- Example from IT:
- Queue for process management, ...
- Operations:
- Creation: new, init
- Check for emptiness: empty?
- Insert additional item: add,...
- Return and remove item with highest priority:
getNext/delMax/...
- Return item with highest priority (without removal):
findMax/peekAtNext/...
Simple Implementations
Time complexity of each operation
Implementation |
Array (ordered) |
Array (unordered) |
Linked list (ordered) |
Linked list (unordered) |
insert |
O(n) |
O(1) |
O(n) |
O(1) |
getNext |
O(1) |
O(n) |
O(1) |
O(n) |
findMax |
O(1) |
O(n) |
O(1) |
O(n) |
Time complexity for each operation differs for different implementations.
But there is always an operation that needs O(n).
Is it possible to improve this?
Ideas for Improving Priority Queue Implementation
Tree Structure
Problem: Balance
Complete Binary Tree
Definition based on tree structure:
- Allmost all internal nodes (except maybe one node) have have 2
children
- All tree layers except the lowermost are full
- The lowermost tree layer is filled from the left
Implementing a Complete Binary Tree with an Array
Heap
A heap is a
- Complete binary tree where
- Each parent always has higher priority than its children
⇒ The root always has the highest priority
We need the following operations for implementing a heap::
- Addition and removal of data items
- Restauration of invariants
Invariant
- A condition that is always maintained in a data structure or algorithm
(especially
loop)
- Very important for data structures
- Can be used in proofs (properties of data structures, correctness of
algorithms, ...)
- After an operation on (change to) a data structure, it may be necessary
to restore invariants
Restoring Heap Invariants
If the priority at a given node is too high: Use heapify_up
- Compare priority with parent
- If parent priority is lower, exchange with parent
- Continue until parent priority is higher
If the priority at a given node is too low: Use heapify_down
- Compare priority with both children
- If necessary, exchange with child with higher priority
- Continue at exchanged child until exchange becomes unnecessary
Implementation: 5heap.rb
Implementing a Priority Queue with a Heap
Time complexity of each operation
Implementation |
Heap (implemented as an Array ) |
insert |
O(log n) |
findMax |
O(1) |
getNext |
O(log n) |
Heap Sort
- Use priority queue to sort by (decreasing) priority
- Create a heap from all the items to be sorted
- Remove items from heap one-by-one: They will be ordered by
(decreasing) priority
- Implementation optimization:
Use space at the end of the array to store removed items
⇒ The items will end up in the array in increasing order
- Time complexity:O(n log n)
- Addition and removal of items is O(log n) for
each item
- To sort n items, the total complexity is O(n
log n)
How to use irb
irb
: Interactive Ruby, a 'command prompt' for Ruby
Example usage:
C:\Algorithms>irb
irb(main):001:0> load './5heap'
=> true
irb(main):002:0> h = Heap.new
=> #<Heap:0x2833d60 @array=[nil], @size=0>
irb(main):003:0> h.insert 3
=> #<Heap:0x2833d60 @array=[nil, 3], @size=1>
irb(main):004:0> h.insert(5).insert(7)
=> #<Heap:0x2833d60 @array=[nil, 7, 3, 5], @size=3>
...
Other Kinds of Heaps
- Priority queues can be used as components in many different
algorithms
- Often, two priority queues need to be joined
- With the 'usual' heap, joining is O(n)
- With a binomial queue, joining is O(log n)
- With a Fibonacci heap, joining can be improved to O(1)
Ideas to Improve Implementation of Priority Queue
- Started with two simple implementations
- Advantages and disadvantages for each implementation
- New idea: Combining both implementations/finding a balance between the
two implementations
- Not completely ordered, but also not completely unordered
→ Partially ordered, just to the extent necessary to find highest
priority item
Conceptual Layers
- Application: Heap sort
- ADT: Priority queue
- Conceptual data structure: Heap
- Actual data structure: Complete binary tree
- Internal implementation: Array
Summary
- A priority queue is an important ADT
- Implementing a priority queue with an array or a linked list is not
efficient
- In a heap, each parent has higher priority than its children
- In a heap, the highest priority item is at the root of a complete binary
tree
- A heap is an efficient implementation of a priority queue
- Many data structures are defined using invariants
- A heap can be used for sorting, using heap sort
Report: Manual Sorting
Deadline: November 3rd, 2015 (Wednesday), 19:00.
Where to submit: Box in front of room O-529 (building O,
5th floor)
Format:
- A4, double-sided 4 pages (2 sheets of paper, stapled in
upper left corner; NO cover page)
- Easily readable handwriting (NO printouts)
- Name (kanji and kana), student number, course name and report name at the
top of the front page
Problem: Propose and describe an algorithm/algorithms for
manual sorting, for the following two cases:
- One person sorts 6000 pages
- 20 people together sort 60000 pages
Each page is a sheet of paper of size A4, where a 10-digit number is printed
in big letters.
The goal is to sort the pages by increasing number. There is no knowledge
about how the numbers are distributed.
You can use the same algorithm for both cases, or a different algorithm.
Details:
- Describe the algorithm(s) in detail, so that e.g. your friends who don't
understand computers can execute them.
- Describe the equipment/space that you need.
- Calculate the overall time needed for each case.
- Analyse the time complexity (O()) of the algorithm(s).
- Comment on the relationship to other algorithms you know, and on the
special needs of manual (as opposed to computer) execution.
- If you use any Web pages, books, ..., list them as references at the end
of your report
Caution: Use IRIs (e.g. http://ja.wikipedia.org/wiki/情報), not URLs
(e.g. http://ja.wikipedia.org/wiki/%E6%83%85%E5%A0%B1)
Homework
(for next week, no need to submit)
- Implement joining two (normal) heaps.
- Think about the time complexity of creating a heap:
heapify_all
will be called n/2 times and may take
up to O(log n) each time.
Therefore, one guess for the overall time complexity is
O(n log n).
However, this upper bound can be improved by careful analysis.
- Find five different applications of sorting.
- Bring a small pair of scissors (to cut paper) to the next lecture.
Glossary
- priority queue
- 順位キュー、優先順位キュー、優先順位付き待ち行列
- complete binary tree
- 完全二分木
- heap
- ヒープ
- internal node
- 内部節
- restauration
- 修復
- invariant
- 不変条件
- sort
- 整列、ソート
- decreasing (order)
- 降順
- increasing (order)
- 昇順
- join
- 合併
- binomial queue
- 2項キュー、2 項ヒープ