Divide and Conquer, Mergesort
(分割統治法、マージソート)
Data Structures and Algorithms
6th lecture, November 3, 2016
http://www.sw.it.aoyama.ac.jp/2016/DA/lecture6.html
Martin J. Dürst
© 2009-16 Martin
J. Dürst 青山学院大学
Today's Schedule
- Summary of last lecture, leftovers, homework
- The importance of sorting
- Simple sorting algorithms: Bubble sort, selection sort, insertion
sort
- Loops in Ruby
- Divide and conquer
- Merge sort
- Summary
Summary of Last Lecture
- A priority queue is an important ADT
- Implementing a priority queue with an array or a linked list is not
efficient
- In a heap, each parent has higher priority than its children
- In a heap, the highest priority item is at the root of a complete
binary tree
- A heap is an efficient implementation of a priority queue
- Many data structures are defined using invariants
- The operations heapify_up and heapify_down are used to restore heap
invariants
- A heap can be used for sorting, using heap sort
Leftovers from Last Lecture
How to use irb; other kinds of heaps; homework (except report)
Report: Manual Sorting
Deadline: November 9, 2016 (Wednesday), 19:00.
Now is a good time to ask questions about this report!
Importance of Sorting
- In most cases of information processing, sorting is needed before
output
- As a preparation for search (example: binary search, index in databases,
...)
- To group related items together
- As a component in more complicated algorithms
Simple Sorting Algorithms
- Bubble sort
- Selection sort
- Insertion sort
Bubble Sort
- Compare neigboring items,
exchange if not in order
- Pass through the data from start to end
- The number of passes needed to fully order the data is O(n)
- The number of comparisons (and potential exchanges) in each pass is O(n)
- Time complexity is O(n2)
Possible improvements:
- Alternatively pass back and forth
- Remember the place of the last exchange to limit the range of
exchanges
Pseudocode/example implementation: 6sort.rb
Various Ways to Loop in Ruby
- Looping a fixed number of times
- Looping with an index
- Many others, ...
Looping a Fixed Number of Times
Syntax:
number.times do
# some work
end
Example:
(length-1).times do
# bubble
end
Looping with an Index
Syntax:
start.upto end do |index|
# some work using index
end
Example:
0.upto(length-2) do |i|
# select
end
Selection Sort
- Start with an unsorted array
- Find the smallest element, and exchange it with the first element
- Continue finding the smallest and exchanging it with the first element of
the rest of the array
- The area at the start of the array that is fully sorted will get larger
and larger
- Number of exchanges: O(n)
- Work needed to find smallest element: O(n)
- Overall time complexity: O(n2)
Details of Time Complexity for Selection Sort
- The number of comparisons to find the minimum of n elements is
n-1
- The size of the unsorted area initially is n elements, at the end 2 elements
- ∑i=2n
n-i+1 = n-1 + n-2 + ... + 2 + 1
= n · (n-1) / 2 = O(n2)
Insertion Sort
- Start with an unsorted array
- View the first element of the array as sorted (sorted area of length
1)
- Take the second element of the array and insert it at the right place in
to the sorted area
→sorted area of length 2
- Continue with the following elements, making the sorted area longer and
longer
- To insert an element into the already sorted area,
move any elements greater than the new element to the right by one
- The (worst-case) time complexity is O(n2)
- Insertion sort is fast if the data is already (almost) sorted
- Insertion sort can be used if data items are added into an already sorted
array
Improvement: Using a sentinel: Add a first data item that is guaranteed to
be smaller than any real data items. This saves one index check.
Details of Time Complexity for Insertion Sort
- The number of elements to be inserted is n
- The maximum number of comparisions/moves when inserting data item number
i is i-1
- ∑i=2n i-1 = 1 + 2 + ... + n-2 + n-1 =
n · (n-1) / 2 = O(n2)
Comparison between Selection Sort and Insertion Sort
|
Selection Sort |
Insertion Sort |
handling first item |
O(n) |
O(1) |
handling last item |
O(1) |
O(n) |
initial area |
perfectly sorted |
sorted, but some items still missing |
rest of data |
greater than any items in sorted area |
any size possible |
advantage |
only O(n) exchanges |
fast if (almost) sorted |
disadvantage |
always same speed |
may get slower if many moves needed |
Divide and Conquer
(Latin: divide et impera)
- Term of military strategy and tactics
- Problem solving method:
Solve a problem by dividing it into smaller problems
- Important principle for programming in general
- Important design principle for algorithms and data structures
Merge Sort (without recursion)
- Split the items to be sorted into two halves
- Separately sort each half
- Combine the two halfs by merging them
Merge
- Two-way merge and multi-way merge
- Create one sorted sequence from two or more sorted sequences
- Repeatedly select the smaller/smallest item from among the input
sequences
- When only one sequence is left, copy the rest of the items
Merge Sort
- Recursively split the items to be sorted into two halves
- Parts with only 1 item are sorted by definition
- Combine the parts (in the reverse order of splitting them) by merging
Time Complexity of Merge Sort
- Split is possible in O(1) time (index
calculation only)
- Merging n items takes O(n) time
- Recurrence:
M(n) = 1 + 2 M(n/2) +
n (more exactly, M(⌈n/2⌉) +
M(⌊n/2⌋) rather than
M(n/2))
M(1) = 0
- Discovering a pattern by repeated substitution:
M(n) = 1 + 2 M(n/2) +
n =
= 1 + 2 (1+ 2 M(n/2/2) + n/2) +
n =
= 1 + 2 + 4 M(n/4) + n + n =
= 1 + 2 + 4 (1 + 2 M(n/4/2) + n/4) +
n + n =
= 1 + 2 + 4 + 8 M(n/8) + n + n
+ n =
= 2k - 1 + 2k
M(n/2k) +
kn
- Using M(1) = 0: n/2k = 1 ⇒
k = log2 n
- M(n) = n - 1 + n
log2 n
- Asymptotic time complexity: O(n log
n)
Properties of Merge Sort
- Because merging two arrays means copying all elements, we need twice as
much memory as the original data
- Merge sort is better suited for external memory than for internal
memory
- External memory:
- Punchcards
- Magnetic tapes
- Hard disks
Summary
- Simple sorting algorithms:
- Bubble sort
- Selection sort
- Insertion sort
- Simple sorting algorithms are all
O(n2)
- Merge sort is based on divide and conquer
- Merge sort is O(n log n) (same as heap
sort)
Preparation for Next Time
- Using the sorting cards, play with your
friends to see which algorithms may be faster.
(Example: Two players, one player uses selection sort, one player uses
insertion sort, who wins?)
- Work on Report: Manual Sorting
Glossary
- bubble sort
- バブル整列法、バブルソート
- selection sort
- 選択整列法、選択ソート
- insertion sort
- 挿入整列法、挿入ソート
- sentinel
- 番兵
- index
- 指数
- divide and conquer
- 分割統治法
- military strategy
- 軍事戦略
- tactics
- 戦術
- design principle
- 設計方針
- merge sort
- マージソート
- merge
- 併合
- 2-way merge
- 2 ウェイ併合
- multiway merge
- マルチウェイ併合
- external memory
- 外部メモリ
- internal memory
- 内部メモリ
- punchcard
- パンチカード
- magnetic tape
- 磁気テープ
- hard disk
- ハードディスク