Divide and Conquer, Mergesort
(分割統治法、マージソート)
Data Structures and Algorithms
6th lecture, October 25, 2018
http://www.sw.it.aoyama.ac.jp/2018/DA/lecture6.html
Martin J. Dürst
© 2009-18 Martin
J. Dürst 青山学院大学
Today's Schedule
- Leftovers, summary of last lecture, homework
- The importance of sorting
- Simple sorting algorithms: Bubble sort, selection sort, insertion
sort
- Loops in Ruby
- Divide and conquer
- Merge sort
- Summary
Leftovers from Last Lecture
Summary of Last Lecture
- A priority queue is an important ADT
- Implementing a priority queue with an array or a linked list is not
efficient
- In a heap, each parent has higher priority than its children
- In a heap, the highest priority item is at the root of a complete
binary tree
- A heap is an efficient implementation of a priority queue
- Many data structures are defined using invariants
- The operations heapify_up and heapify_down are used to restore heap
invariants
- A heap can be used for sorting, using heap sort
Last Week's Homework
(no need to submit, but bring the sorting cards)
- Complete the report (deadline October 24, 2018 (Wednesday), 19:00)
- Cut the sorting cards, and bring them
with you to the next lecture
- Shuffle the sorting cards, and try to find a fast way to sort them. Play
against others (who is fastest?).
- Find five different applications of sorting (no need to submit)
- Implement joining two (normal) heaps (no need to submit)
- Add the items of the smaller heap to the bigger heap (6heapmerge.rb)
- Use a binomial heap (binomial queue)
- Think about the time complexity of creating a heap:
heapify_down
will be called n/2 times and may take
up to O(log n) each time.
Therefore, one guess for the overall time complexity is
O(n log n).
However, this upper bound can be improved by careful analysis.
(no need to submit)
Report: Manual Sorting: Problems Seen
- 218341.368 seconds (⇒about 61 hours)
- 61010·103·1010 (units? way too big)
- O(40000) (how many seconds could this
be)
- Calulation of actual time backwards from big-O notation:
1second/operation, n=5000,
O(n2) ⇒ 25'000'000 seconds?
- A O(n) algorithm (example: "5
seconds per page")
- For 12 people, having only one person work towards the end of the
algorithm
- For humans, binary sorting is constraining (sorting
into 3~10 parts is better)
- Using bubble sort (868 days without including breaks
or sleep)
- Prepare 1010 boxes (problem: space, cost,
distance for walking)
- Forgetting time for preparation, cleanup, breaks,...
- Submitting just a program
- Report too short
Homework 3: Computational Complexity of heapify_all
Derivation
∑0≤i≤∞ i/2i
=
= 0/1 + 1/2 + 2/4 + 3/8 + 4/16 + 5/32 + 6/64 + 7/128 + 8/256 + 9/512 +
...
< 1/2 + 2/4 + 4/8 + 4/8 + 8/32 + 8/32 + 8/32 +
8/32 + 16/512 + ...
= 1/2 + 1·2/4 +2·4/8 + 4·8/32 +
8·16/512 + 16·32/131072 + ...
= 1/2 + 21/22 +
23/23 + 25/25 +
27/29 + 29/217 +
211/233 + 213/265 + ...
= 1/2 +
∑0≤k≤∞2(1+2k)-(2k+1)
= 1/2 +
∑0≤k≤∞22k-2k
< 3.254
Importance of Sorting
- Make output easy to understand and check (search by humans)
- Group related items together
- Preparation for search (example: binary search, index in databases,
...)
- Use as component in more complicated algorithms
Simple Sorting Algorithms
- Bubble sort
- Selection sort
- Insertion sort
Bubble Sort
- Compare neigboring items,
exchange if not in order
- Pass through the data from start to end, repeatedly
- The number of passes needed to fully order the data is O(n)
- The number of comparisons (and potential exchanges) in each pass is O(n)
- Time complexity is O(n2)
Possible improvements:
- Alternatively pass back and forth
- Remember the place of the last exchange to limit the range of
exchanges
- Work in parallel
Pseudocode/example implementation: 6sort.rb
Various Ways to Loop in Ruby
- Looping a fixed number of times
- Looping with an index
- Many others, ...
Looping a Fixed Number of Times
Syntax:
number.times do
# some work
end
Example:
(length-1).times do
# bubble
end
Looping with an Index
Syntax:
start.upto end do |index|
# some work using index
end
Example:
0.upto(length-2) do |i|
# select
end
Selection Sort
- Start with an unsorted array
- Find the smallest element, and exchange it with the first element
- Continue finding the smallest and exchanging it with the first element of
the rest of the array
- The area at the start of the array that is fully sorted will get larger
and larger
- Number of exchanges: O(n)
- Work needed to find smallest element: O(n)
- Overall time complexity: O(n2)
Details of Time Complexity for Selection Sort
- The number of comparisons to find the minimum of n elements is
n-1
- The size of the unsorted area initially is n elements, at the end 2 elements
- ∑i=2n
n-i+1 = n-1 + n-2 + ... + 2 + 1
= n · (n-1) / 2 = O(n2)
Insertion Sort
- Start with an unsorted array
- View the first element of the array as sorted (sorted area of length
1)
- Take the second element of the array and insert it at the right place in
to the sorted area
→sorted area of length 2
- Continue with the following elements, making the sorted area longer and
longer
- To insert an element into the already sorted area,
move any elements greater than the new element to the right by one
- The (worst-case) time complexity is O(n2)
- Insertion sort is fast if the data is already (almost) sorted
- Insertion sort can be used if data items are added into an already sorted
array
Improvement: Using a sentinel: Add a first data item that is guaranteed to
be smaller than any real data items. This saves one index check.
Details of Time Complexity for Insertion Sort
- The number of elements to be inserted is n
- The maximum number of comparisions/moves when inserting data item number
i is i-1
- ∑i=2n i-1 = 1 + 2 + ... + n-2 + n-1 =
n · (n-1) / 2 = O(n2)
Comparison: Selection Sort vs. Insertion Sort
|
Selection Sort |
Insertion Sort |
handling first item |
O(n) |
O(1) |
handling last item |
O(1) |
O(n) |
initial area |
perfectly sorted |
sorted, but some items still missing |
rest of data |
greater than any items in sorted area |
any size possible |
advantage |
only O(n) exchanges |
fast if (almost) sorted |
disadvantage |
always same speed |
may get slower if many moves needed |
Divide and Conquer
(Latin: divide et impera)
- Term of military strategy and tactics
- Problem solving method:
Solve a problem by dividing it into smaller problems
- Important principle for programming in general
(e.g. split a bigger program into various functions)
- Important design principle for algorithms and data structures
Merge Sort (without recursion)
- Split the items to be sorted into two halves
- Separately sort each half
- Combine the two halfs by merging them
Merge
- Two-way merge and multi-way merge
- Create one sorted sequence from two or more sorted sequences
- Repeatedly select the smaller/smallest item from the input sequences
- When only one sequence is left, copy the rest of the items
Merge Sort
- Recursively split the items to be sorted into two halves
- Parts with only 1 item are sorted by definition
- Combine the parts (in the reverse order of splitting them) by merging
Time Complexity of Merge Sort
- Split is possible in O(1) time (index
calculation only)
- Merging n items takes O(n) time
- Recurrence:
M(1) = 0
M(n) = 1 + 2
M(n/2)(*) + n (1) = 0
- Discovering a pattern by repeated substitution:
M(n) = 1 + 2 M(n/2) +
n =
= 1 + 2 (1+ 2 M(n/2/2) + n/2) +
n =
= 1 + 2 + 4 M(n/4) + n + n =
= 1 + 2 + 4 (1 + 2 M(n/4/2) + n/4) +
n + n =
= 1 + 2 + 4 + 8 M(n/8) + n + n
+ n =
= 2k - 1 + 2k
M(n/2k) +
kn
- Using M(1) = 0: n/2k = 1 ⇒
k = log2 n
- M(n) = n - 1 + n
log2 n
- Asymptotic time complexity: O(n log n)
(*) more exactly, M(⌈n/2⌉) +
M(⌊n/2⌋) rather than 2
M(n/2)
Properties of Merge Sort
- Merging means copying all elements
⇒ We need twice the memory of the original data
- Merge sort is better suited for external memory than for internal
memory
- External memory:
- Punchcards
- Magnetic tapes
- Hard disks (HD)
- Solid state drives (SSD)
Summary
- Simple sorting algorithms:
- Bubble sort (easiest to implement)
- Selection sort (only O(n) data exchanges)
- Insertion sort (fast when already (almost) sorted)
- Simple sorting algorithms are all
O(n2)
- Merge sort is based on divide and conquer
- Merge sort is O(n log n) (same as heap
sort)
Homework for Next Time
- Using the sorting cards, play with your
friends to see which algorithms may be faster.
(Example: Two players, one player uses selection sort, one player uses
insertion sort, who wins?)
Glossary
- bubble sort
- バブル整列法、バブルソート
- selection sort
- 選択整列法、選択ソート
- insertion sort
- 挿入整列法、挿入ソート
- sentinel
- 番兵
- index
- 指数
- divide and conquer
- 分割統治法
- military strategy
- 軍事戦略
- tactics
- 戦術
- design principle
- 設計方針
- merge sort
- マージソート
- merge
- 併合
- 2-way merge
- 2 ウェイ併合
- multiway merge
- マルチウェイ併合
- external memory
- 外部メモリ
- internal memory
- 内部メモリ
- punchcard
- パンチカード
- magnetic tape
- 磁気テープ
- hard disk
- ハードディスク