Implementation of 2-3-4 trees is quite complicated
Some memory (in nodes with 2 or 3 children) is unused
Therefore, other balanced trees have been proposed
Red-Black-Trees
Implementation of a 2-3-4 tree with a binary tree
The edges of the original tree are black
Nodes with 3 or 4 children are split into multiple nodes, coloring the
internal edges red
Two consecutive red edges are forbidden
If this invariant is violated, rotations are used for
restoration
If only black edges are counted, the tree is of uniform height
When all edges are considered, the maximum depth of a leaf is at most
twice the minimum depth (O(log
n))
AVL-Trees
Proposed by Adelson-Velskii and
Landis (Адельсон-Вельский and
Ландис) in 1962
Oldest (binary) balanced tree
Invariant: At each internal node, the difference between the heights of
the subtrees is 1 or less
The difference between the heights of the left and the right subtrees
(-1, 0, 1) is stored in each internal node and kept up to date
The tree height is limited to 1.44 log2n
Searching is slightly faster than for a red-black-tree
Insertion and deletion are slightly more complicated than for a
red-black-tree
Secondary Storage
Internal Memory
External (Secondary) Storage
Access principle
random
random
linear
Technology
dynamic RAM
SSD, HD
magnetic tape
Unit of access
word
page/sector
record
Example unit size
32/64 bits (4/8 bytes)
512/1024/2048/4096/... bytes
varying
Access speed
nanoseconds
micro/milliseconds
seconds or minutes
B-Trees
Variant of 2-3-4 trees
Each page is a node in the tree
→ Efficient access to external memory
Maximise the number of keys per page
The minimum number of keys per page is about half of the maximum
Page of a B-Tree
B+ Trees
Starting with a B-tree, all data (except keys) is moved to lowest layer of
tree
⇒ The number of keys and child nodes per internal node increase
(for practical applications, the size of a key is much smaller than the size of
the data)
⇒ The height of the tree shrinks
⇒ Access to data is faster
(the overall access time is dominated by the number of pages that have to be
fetched from secondary memory)
Internal Page of a B+ Tree
Leaf Page of a B+ Tree
Definition of Variables for B+ Trees
n: Overall number of data items (example: 50,000)
Lp: Page size (example: 1024 bytes)
Lk: Key size (example: 4 bytes)
Ld: Data size (one item, except key)
(example: 240 bytes)
Lpp: Size of page number (page
reference) (example: 4 bytes)
αmin: minimum occupancy (usually
0.5)
Items per Page for B+Trees
(⌊a⌋ is the floor function of a, the greatest
integer smaller than or equal to a,
⌊a⌋∈ℤ ∧ ⌊a⌋≦a ∧
¬∃b: b∈ℤ ∧
⌊a⌋≦b<a)
dmax =
⌊Lp /
(Lk +
Ld)⌋ (example: 4)
(maximum number of data items per leaf page)
dmin =
⌊dmaxαmin⌋ (example: 2)
(minimum number of data items per leaf page)
kmax =
⌊Lp /
(Lk +
Lpp)⌋ (example: 128)
(maximum number of children per internal node)
kmin =
⌊kmaxαmin⌋ (example: 64)
(minimum number of children per internal node)
Number of Nodes for B+Trees
(⌈a⌉ is the ceiling function of a, the smallest
integer greater than or equal to a,
⌈a⌉∈ℤ ∧ a≦⌈a⌉ ∧
¬∃b: b∈ℤ ∧
a<b≦⌈a⌉)
Ndmax = ⌈n /
dmin⌉ (example: 25,000)
(maximum number of leave pages)
Ndmin = ⌈n /
dmax⌉ (example: 12,500)
(minimum number of leave pages)
Nkmax =
⌈Ndmax /
kmin⌉ +
⌈Ndmax /
kmin2⌉ ...
(maximum number of internal nodes)
(example: 391 + 7 + 1 = 399; height of B+tree:
4; total number of nodes: 25,399)
Nkmin =
⌈Ndmin /
kmax⌉ +
⌈Ndmin /
kmax2⌉ + ...
(minimum number of internal nodes)
(example: 98 + 1 = 99; height of B+tree: 3; total number of nodes: 12,599)
Summary
Balanced search trees are important for efficient implementation of
dictionary ADTs
2-3-4 trees and B(+)trees increase the degree of a binary tree, but keep
the tree height constant
Red-black-trees and AVL-trees impose limitations on the variation of the
tree heigh
Balanced trees allow to implement the basic operations on a dictionary
ADT in O(log n) worst-case time
B-trees and B+ trees are extremely important for the implementation of
file systems and databases on secondary storage