Quicksort, Average Time Complexity
(クイックソート、平均計算量)
Data Structures and Algorithms
7th lecture, November 14, 2019
http://www.sw.it.aoyama.ac.jp/2019/DA/lecture7.html
Martin J. Dürst
© 2009-19 Martin
J. Dürst 青山学院大学
Today's Schedule
- Leftovers, summary of last lecture
- Quicksort:
Concept, implementation, optimizations
- Average time complexity
- Sorting in C, Ruby, ...
- Comparing sorting algorithms using animation
Leftovers of Last Lecture
Summary of Last Lecture
- Sorting is a very important operation for Information Technology
- Simple sorting algorithms are all
O(n2)
- Merge sort is O(n log n) and needs
double memory
- Heap sort is O(n log n) and uses a large
number of comparisons and exchanges
Report: Manual Sorting: Problems Seen
- 218341.368 seconds (⇒about 61 hours)
- 61010·103·1010 (units? way too big)
- O(40000) (how many seconds could this
be)
- Calulation of actual time backwards from big-O notation:
1second/operation, n=5000,
O(n2) ⇒ 25'000'000 seconds?
- A O(n) algorithm (example: "5
seconds per page")
- For 12 people, having only one person work towards the end of the
algorithm
- For humans, binary sorting is constraining (sorting
into 3~10 parts is better)
- Using bubble sort (868 days without including breaks
or sleep)
- Prepare 1010 boxes (problem: space, cost,
distance for walking)
- Forgetting time for preparation, cleanup, breaks,...
- Submitting just a program
- Report too short
Today's Goals
Using quicksort as an example, understand
- Different ways to use divide-and-conquer for sorting
- Move from algorithmic concept to efficient implementation
- Average time complexity
History of Quicksort
- Invented by C. A. R. Hoare in 1959
- Researched in great detail
- Extremely widely used
Reviewing Divide and Conquer
- Heap sort: The highest priority item of the overall tree is the highest
priority item of the two subtrees
- Merge sort: Split into equal-length parts, recurse, merge
- Quicksort: Use an arbitrary boundary element to
partition the data, recursively
Basic Workings of Quicksort
- Select one element as the partitioning element
(pivot)
- Split elements so that:
- Elements smaller than the pivot go to the left, and
- Elements larger than the pivot go to the right
- Apply quicksort recursively to the data on the left and right sides of
the pivot
Ruby pseudocode/implementation: conceptual_quick_sort
in 7qsort.rb
Comparison of Mergesort and Quicksort
Both algorithms use the same split-recurse-merge pattern, but there are
important differences:
|
Mergesort |
Quicksort |
split |
equal size |
size unpredictable |
work done |
on merge |
on split |
no work needed |
on split |
on merge |
Quicksort Implementation Core
- Use e.g. the rightmost element as the pivot
- Starting from the left, find an element larger than the pivot
- Starting from the right, find an element smaller than the pivot
- Exchange the elements found in steps 2. and 3.
- Repeat steps 2.-4. until no further exchanges are needed
- Exchange the pivot with the element in the middle
- Recurse on both sides
Ruby pseudocode/implementation: simple_quick_sort
in 7qsort.rb
Worst Case Complexity
- What happens if the largest (or the smallest) element is always choosen
as the pivot?
- The time complexity is Qw(n) = n +
Qw(n-1) = Σni=1
i
⇒ O(n2)
- This is the worst case complexity (worst case running
time) for quick sort
- This complexity is the same as the complexity of the simple sorting
algorithms
- This worst case can easily happen if the input is already sorted
Best Case Complexity
- QB(1) = 0
- QB(n) = n + 1 + 2
QB(n/2)
- Same as merge sort
- ⇒ O(n log
n)
- Unclear whether this is relevant
For most algorithms (but there are exceptions):
- Worst case complexity is very important
- Best case complexity is mostly irrelevant
Average Complexity
Calculating QA (first part)
QA(n) = n
+ 1 + 1/n Σ1≤k≤n
(QA(k-1)+QA(n-k))
QA(0) + ... +
QA(n-2) + QA(n-1) =
= QA(n-1) + QA(n-2) + ... +
QA(0)
QA(n) = n
+ 1 + 2/n Σ1≤k≤n
QA(k-1)
n QA(n) =
n (n + 1) + 2
Σ1≤k≤n
QA(k-1)
(n-1)
QA(n-1) = (n-1) n + 2
Σ1≤k≤n-1
QA(k-1)
Calculating QA (second part)
n QA(n) -
(n-1) QA(n-1) = n (n+1)
- (n-1) n + 2 QA(n-1)
n QA(n) =
(n+1) QA(n-1) + 2n
QA(n)/(n+1) =
QA(n-1)/n + 2/(n + 1)
QA(n)/(n+1) =
= QA(n-1)/n + 2/(n + 1)
= = QA(n-2)/(n-1) + 2/n +
2/(n+1) =
= QA(n-3)/(n-2) + 2/(n-1)
2/n + 2/(n+1) = ...
= QA(2)/3 + Σ3≤k≤n
2/(k+1)
QA(n)/(n+1) ≈ 2
Σ1≤k≤n 2/k ≈
2∫1n 1/x
dx = 2 ln n
Result of Calculating QA
QA(n) ≈ 2n ln n ≈ 1.39
n log2 n
⇒ O(n log n)
⇒ The number of comparisons on average is ~1.39 times the optimal number
of comparisons in an optimal decision tree
Distribution around Average
- A good average complexity is not enough if the worst case is frequently
reached
- For QA, it can be shown that the standard deviation is about
0.65n
- This means that the probability of deviation from the average very
quickly gets extremely small
- That means that assuming a normal distribution:
~68% is within 1.39 n log2 n
±0.65n
~95% is within 1.39 n log2 n
±1.3n
~99.7% is within 1.39 n log2 n
±1.95n
~99.993% is within 1.39 n log2 n
±2.6n
~99.999'94% is within 1.39 n log2 n
±3.25n
~99.999'999'8% is within 1.39 n log2 n
±3.9n
Complexity of Sorting
Question: What is the complexity of sorting (as a problem)?
- Many good sorting algorithms are O(n log n)
- The basic operations for sorting are comparison and movement
- For n data items, the number of different sorting orders
(permutations) is n!
- With each comparision, in the best case, we can reduce the number of
sorting orders to half
- The mimimum number of comparisions necessary for sorting is log (n!) ≈ O(n log n)
- Sorting using pairwise comparison is Ω(n log n)
Pivot Selection
- The efficiency of quicksort strongly depends on the selection of the
pivot
- Some solutions:
- Select rightmost element
(dangerous!)
- Use the median of three values
- Use the value at a random location
(this is an example of a randomized algorithm)
Implementation Improvements
- Comparison of indices
→ Use a sentinel to remove one comparision
- Stack overflow for deep recursion
→ When splitting into two, use recursion for the smaller part, and tail
recursion or a loop for the larger part
- Low efficiency of quicksort for short arrays/parts
→ For parts smaller than a given size, change to a simple sort
algorithm
→ With insertion sort, it is possible to do this in one go at the very
end
(this needs care when testing)
→ Quicksort gets about 10% faster if change is made at an array size of
about 10
- Duplicate keys
→ Split in three rather than two
Ruby pseudocode/implementation (excluding split in three):
quick_sort
in 7qsort.rb
Comparing Sorting Algorithms using Animation
- Uses Web technology: SVG
(2D vector graphics) and JavaScript
- Uses special library (Narrative JavaScript) for timing adjustments
- Comparisons are shown in
yellow (except for insertion sort), exchanges in
blue
Watch animation: sort.svg
Stable Sorting
- Definition: A sorting algorithm is stable if it retains the
original order for two data items with the same key value
- Used for sorting with multiple criteria (e.g. sort by year and
prefecture):
- First, sort using the lower priority criterion (e.g. prefecture)
- Then, sort using the higher priority criterion (e.g. year)
- The simple sorting algorithms and merge sort can easily be made
stable
- Heap sort and quicksort are not stable
→ Solution 1: Sort multiple criteria together
→ Solution 2: Use the original position as a lower priority criterion
Sorting in C and Ruby
- Sorting is provided as a library function or method
- Implementation is often based on quicksort
- Comparison of data items depends on type of data and purpose of
sorting
→ Use comparison function as a function argument
- If comparison is slow
→ Precompute a value that can be used for sorting
- If exchange of data items is slow (e.g. very large data items)
→ Sort/exchange references (pointers) only
C's qsort
Function
void qsort(
void *base, // start of array
size_t nel, // number of elements in array
size_t width, // element size
int (*compar)( // comparison function
const void *,
const void *)
);
Ruby's Array#sort
(Klass#method
denotes instance method method
of
class Klass
)
array.sort
uses <=>
for comparison
array.sort { |a, b| a.length <=> b.length }
This example sorts (e.g. strings) by length
The code block (between {
and }
) is used as a
comparison function
Ruby's <=>
Operator
(also called spaceship operator, similar to strcmp
in C)
Relationship between a and b |
return value of a <=> b |
a < b |
-1 (or other integer smaller than 0) |
a = b |
0 |
a > b |
+1 (or other integer greater than 0) |
Ruby's Array#sort_by
array.sort_by { |str| str.length }
or array.sort_by
&:length
(sorting strings by length)
array.sort_by { |stu| [stu.year, stu.prefecture]
}
(sorting students by year and prefecture)
This calculates the values for the sort criterion for each array element in
advance
Summary
- Quicksort is another application of divide and conquer
- Quicksort is a very famous algorithm, and a good example to learn about
algorithms and their implementation
- Quicksort has been carefully researched and widely implemented and
used
- Quicksort is a classical example of the importance of average time
complexity
- Quicksort is our first example of a randomized algorithm
- Sorting based on pairwise comparison is Θ(n log
n)
Preparation for Next Time
- Think about inputs for which
conceptual_quick_sort
will
fail
- Watch the animations carefully (>20 times) to deepen your
understanding of sorting algorithms
Glossary
- quicksort
- クイックソート
- partition
- 分割
- partitioning element (pivot)
- 分割要素
- worst case complexity (running time)
- 最悪時の計算量
- best case complexity (running time)
- 最善時の計算量
- average complexity (running time)
- 平均計算量
- standard deviation
- 標準偏差
- randomized algorithm
- ランドム化アルゴリズム
- median
- 中央値
- decision tree
- 決定木
- tail recursion
- 末尾再帰
- in one go
- 一括
- stable sorting
- 安定な整列法
- criterion (plural criteria)
- 基準
- block
- ブロック