Quicksort, Average Time Complexity
(クイックソート、平均計算量)
Data Structures and Algorithms
7th lecture, November 2, 2017
http://www.sw.it.aoyama.ac.jp/2017/DA/lecture7.html
Martin J. Dürst
© 2009-17 Martin
J. Dürst 青山学院大学
Today's Schedule
- Summary of last lecture
- About the Manual Sorting report
- Quicksort:
Concept, implementation, optimizations
- Average time complexity
- Sorting in C, Ruby, ...
- Comparing sorting algorithms using animation
Summary of Last Lecture
- Sorting is a very important operation for Information Technology
- Simple sorting algorithms are all
O(n2)
- Merge sort (O(n log n)) needs double
memory
- Heap sort (O(n log n)) uses a large
number of comparisons and exchanges
Report: Manual Sorting: Problems Seen
- 218341.368 seconds (⇒about 61 hours)
- 61010·103·1010 (units? way too big)
- O(60000) (how many seconds could this
be)
- Calulation of actual time backwards from big-O notation
(1second/operation, n=5000,
O(n2) ⇒ 25'000'000 seconds?)
- A O(n) algorithm (example: "5
seconds per page")
- For 20 people, having only one person work at the end of the
algorithm
- For humans, binary sorting is constraining (sorting
into 3~10 parts is better)
- Using bubble sort (868 days without including breaks
or sleep)
- Prepare 1010 boxes (problem: space, cost,
distance for walking)
- Forgetting time for preparation, cleanup, breaks,...
- Submitting just a program
- Report too short
Today's Goals
Using quicksort as an example, understand
- Different ways to use divide-and-conquer for sorting
- Move from algorithmic concept to efficient implementation
- Average time complexity
History of Quicksort
- Invented by C. A. R. Hoare in 1959
- Researched in great detail
- Extremely widely used
Reviewing Divide and Conquer
- Heap sort: The highest priority item of the overall tree is the highest
priority item of the two subtrees
- Merge sort: Split into equal-length parts, recurse, merge
- Quicksort: Use an arbitrary boundary element to
partition the data, recursively
Basic Workings of Quicksort
- Select one element as the partitioning element
(pivot)
- Split elements elements so that:
- Elements smaller than the pivot go to the left, and
- Elements larger than the pivot go to the right
- Apply quicksort recursively to the data on the left and right sides of
the pivot
Ruby pseudocode/implementation: conceptual_quick_sort
in 7qsort.rb
Quicksort Implementation Core
- Use e.g. the rightmost element as the pivot
- Starting from the right, find an element smaller than the pivot
- Starting from the left, find an element larger than the pivot
- Exchange the elements found in steps 2. and 3.
- Repeat steps 2.-4. until no further exchanges are needed
- Exchange the pivot with the element in the middle
- Recurse on both sides
Ruby pseudocode/implementation: simple_quick_sort
in 7qsort.rb
Worst Case Complexity
- What happens if the largest (or the smallest) element is always choosen
as the pivot?
- The time complexity is Qw(n) = n +
Qw(n-1) = Σni=1
i
⇒ O(n2)
- This is the worst case complexity (worst case running
time) for quick sort
- This complexity is the same as the complexity of the simple sorting
algorithms
- This worst case can easily happen if the input is already sorted
Best Case Complexity
- QB(n) = n + 1 + 2
QB(n/2)
- QB(1) = 0
- Same as merge sort
- ⇒ O(n log
n)
- Unclear whether this is relevant
For most algorithms (but there are exceptions):
- Worst case complexity is very important
- Best case complexity is mostly irrelevant
Average Complexity
Calculating QA
[1] QA(n) = n + 1 + 1/n
Σ1≤k≤n
(QA(k-1)+QA(n-k))
[2] QA(0) + ... + QA(n-2) +
QA(n-1) =
= QA(n-1) + QA(n-2) + ... +
QA(0)
[3] QA(n) = n + 1 + 2/n
Σ1≤k≤n
QA(k-1) [use [2] in [1]]
[4] n QA(n) = n (n +
1) + 2 Σ1≤k≤n
QA(k-1) [multiply [3] by n]
[5] (n-1) QA(n-1) = (n-1)
n + 2
Σ1≤k≤n-1
QA(k-1) [[4], with n replaced by
n-1]
Calculating QA (continued)
[6] n QA(n) - (n-1)
QA(n-1) = n (n+1) -
(n-1) n + 2 QA(n-1) [[4]-[5]]
[7] n QA(n) = (n+1)
QA(n-1) + 2n [simplifying [6]]
[8] QA(n)/(n+1) =
QA(n-1)/n + 2/(n + 1)
[dividing [7] by n (n+1)]
QA(n)/(n+1) =
= QA(n-1)/n + 2/(n + 1)
= [repeatedly expand right side of [8] by using [8]]
= QA(n-2)/(n-1) + 2/n +
2/(n+1) =
= QA(n-3)/(n-2) + 2/(n-1)
2/n + 2/(n+1) = ...
= QA(2)/3 + Σ3≤k≤n
2/(k+1) [approximating sum by integral]
QA(n)/(n+1) ≈ 2
Σ1≤k≤n 2/k ≈
2∫1n 1/x
dx = 2 ln n
Result of Calculating QA
QA(n) ≈ 2n ln n ≈ 1.39
n log2 n
⇒ O(n log n)
⇒ The number of comparisons on average is ~1.39 times the optimal number
of comparisons in an optimal decision tree
Distribution around Average
- A good average complexity is not enough if the worst case is frequently
reached
- For QA, it can be shown that the standard deviation is about
0.65n
- This means that the probability of deviation from the average very
quickly gets extremely small
Complexity of Sorting
Question: What is the complexity of sorting (as a problem)?
- Many good sorting algorithms are O(n log n)
- The basic operations for sorting are comparison and movement
- For n data items, the number of different sorting orders
(permutations) is n!
- With each comparision, in the best case, we can reduce the number of
sorting orders to half
- The mimimum number of comparisions necessary for sorting is log (n!) ≈ O(n log n)
- Sorting using pairwise comparison is Ω(n log n)
Pivot Selection
- The efficiency of quicksort strongly depends on the selection of the
pivot
- Some solutions:
- Select rightmost element
(dangerous!)
- Use the median of three values
- Use the value at a random location
(this is an example of a randomized algorithm)
Implementation Improvements
- Comparison of indices
→ Use a sentinel to remove one comparision
- Stack overflow for deep recursion
→ When splitting into two, use recursion for the smaller part, and tail
recursion or a loop for the larger part
- Low efficiency of quicksort for short arrays/parts
→ For parts smaller than a given size, change to a simple sort
algorithm
→ With insertion sort, it is possible to do this in one go at the very
end
(this needs care when testing)
→ Quicksort gets about 10% faster if change is made at an array size of
about 10
- Duplicate keys
→ Split in three rather than two
Ruby pseudocode/implementation (excluding split in three):
quick_sort
in 7qsort.rb
Comparing Sorting Algorithms using Animation
- Uses Web technology: SVG
(2D vector graphics) and JavaScript
- Uses special library (Narrative JavaScript)
for timing adjustments
- Comparisons are shown in
yellow (except for insertion sort), exchanges in
blue
Watch animation: sort.svg
Stable Sorting
- Definition: A sorting algorithm is stable if it retains the
original order for two data items with the same key value
- Used for sorting with multiple criteria (e.g. sort by year and
prefecture):
- First, sort using the lower priority criterion (e.g. prefecture)
- Then, sort using the higher priority criterion (e.g. year)
- The simple sorting algorithms and merge sort can easily be made
stable
- Heap sort and quicksort are not stable
→ Solution 1: Sort multiple criteria together
→ Solution 2: Use the original position as a lower priority criterion
Sorting in C and Ruby
- Sorting is provided as a library function or method
- Implementation is often based on quicksort
- Comparison of data items depends on type of data and purpose of
sorting
→ Use comparison function as a function argument
- If comparison is slow
→ Precompute a value that can be used for sorting
- If exchange of data items is slow (e.g. very large data items)
→ Sort/exchange references (pointers) only
C's qsort
Function
void qsort(
void *base, // start of array
size_t nel, // number of elements in array
size_t width, // element size
int (*compar)( // comparison function
const void *,
const void *)
);
Ruby's Array#sort
(Klass#method
denotes instance method method
of
class Klass
)
array.sort
uses <=>
for comparison
array.sort { |a, b| a.length <=> b.length }
This example sorts (e.g. strings) by length
The code block (between {
and }
) is used as a
comparison function
Ruby's <=>
Operator
(also called spaceship operator, similar to strcmp
in C)
Relationship between a and b |
return value of a <=> b |
a < b |
-1 (or other integer smaller than 0) |
a = b |
0 |
a > b |
+1 (or other integer greater than 0) |
Ruby's Array#sort_by
array.sort_by { |str| str.length }
or array.sort_by
&:length
(sorting strings by length)
array.sort_by { |stu| [stu.year, stu.prefecture]
}
(sorting students by year and prefecture)
This calculates the values for the sort criterion for each array element in
advance
Summary
- Quicksort is another application of divide and conquer
- Quicksort is a very famous algorithm, and a good example to learn about
algorithms and their implementation
- Quicksort has been carefully researched and widely implemented and
used
- Quicksort is a classical example of the importance of average time
complexity
- Quicksort is our first example of a randomized algorithm
- Sorting based on pairwise comparison is Θ(n log
n)
Preparation for Next Time
- Think about inputs for which
conceptual_quick_sort
will
fail
- Watch the animations carefully (>20 times) to deepen your
understanding of sorting algorithms
Glossary
- quicksort
- クイックソート
- partition
- 分割
- partitioning element (pivot)
- 分割要素
- worst case complexity (running time)
- 最悪時の計算量
- best case complexity (running time)
- 最善時の計算量
- average complexity (running time)
- 平均計算量
- standard deviation
- 標準偏差
- randomized algorithm
- ランドム化アルゴリズム
- median
- 中央値
- decision tree
- 決定木
- tail recursion
- 末尾再帰
- in one go
- 一括
- stable sorting
- 安定な整列法
- criterion (plural criteria)
- 基準
- block
- ブロック