Data Structure Checklist: Heap

Heaps are a useful data structure where the element on the top possesses the maximum (for max heap) or minimum (for min heap) value. A heap is essentially a specialized tree with two properties.

  1. It’s a complete binary tree (why?).
  2. In a max heap, for any given pair of nodes with the parent-child relationship, the parent’s value is larger than or equal to the child’s value. Similarly, in a min heap, for any given pair of nodes with the parent-child relationship, the parent’s value is smaller than or equal to the child’s value.

For simplicity, we will just use max heaps as examples for the rest of the article.

Heapify

The process of rearranging an array to a heap is called heapify. To heapify an array, we need to heapify every element of the array by moving each of them to the correct position. Heapify can be done in-place by swapping elements. Depending on the direction of swapping, the heapify of an element can be done in two ways:

  • Bottom-up
    1. Given a node that is not in the correct position of the heap.
    2. Swap the node with its parent if its value is larger.
    3. Repeat Step 2 until it stops.
// a[1] to a[n] is the array storing a heap with a maximum size of n.
// count is the current size of the heap.
// a[index] is the element to apply the bottom-up heapify.
void BottomUpHeapifyElement(int[] a, int index)
{
    for (int i = index; i/2 > 0 && a[i] > a[i/2]; i /= 2)
    {
        swap(a[i], a[i/2]);
    }
}
  • Top-down
    1. Given a node that is not in the correct position of the heap.
    2. If the node’s value is smaller than at least one of its children, swap it with the child that has the largest value.
    3. Repeat Step 2 until it stops.
// a[1] to a[n] is the array storing a heap with a maximum size of n.
// count is the current size of the heap.
// a[index] is the element to apply the top-down heapify.
void TopDownHeapifyElement(int[] a, int index, int count)
{
    int i = index;
    int maxValuePos = i;
    while (true)
    {
        if (i*2 <= count && a[i] < a[i*2])
            maxValueIndex = i*2;
        if (i*2+1 <= count && a[maxValueIndex] < a[i*2+1])
            maxValueIndex = i*2+1;
        if (maxValueIndex == i)
            break;
        swap(a[i], a[maxValueIndex]);
        i = maxValueIndex;
    }
}

Given a complete binary tree that contains \(n\) nodes, its height will not exceed beyond \(\log n\). Therefore, the heapify of one element in an array, either bottom-up or top-down, takes \(O(\log n)\) time. Depending on the initial status, the heapify of a whole array involves different time complexities. For example, there are three typical heapify scenarios:

  • Heap insertion–inserting a new element to a heap and then apply heapify.
  • Heap deletion–deleting an existing element (usually the element in the top position) from a heap and then apply heapify.
  • Heap creation–creating a heap out of an array.

The heap insertion and deletion only need to heapify one element. Therefore, both of them can be done in \(O(\log n)\) time. On the other hand, the heap creation is basically to heapify a whole array which requires linear time. Let’s review each of them in details and also see how bottom-up and top-down approaches are applied in each situation appropriately.

Heap insertion

The insertion of a new element often time breaks the properties of a heap, which then requires a heapify.

  1. Insert the new element as a new node to the left-most empty slot of the heap (that is, append the new element to the end of the array).
  2. Apply a bottom-up heapify to that newly inserted node.
// a[1] to a[n] is the array storing a heap with a maximum size of n.
// count is the current size of the heap.
// data is the new element to insert.
void HeapPush(int[] a, int data)
{
    if (count == n)
        return;
    ++count;
    a[count] = data;
    BottomUpHeapifyElement(a, count);
}

Heap deletion

When deleting an existing element from a heap, often time we are removing the root node from the heap (i.e., popping the largest element from a max heap). Similar to the heap insertion, let’s first try the bottom-up direction.

  1. Remove the node in the top position of the heap.
  2. Compare the left and right children of the original root node and move the one with the larger value to the root position.
  3. Recursively apply Step 2 to the sub-tree whose root was moved in Step 2.

Note that such strategy may end up with a tree that is not a complete binary tree, thus not a heap. Instead, we can apply the top-down direction.

  1. Remove the node in the top position of the heap.
  2. Move the right-most leaf node to the root position (that is, move the last element of the array to the head of it).
  3. Apply a top-down heapify to the new root node.
// a[1] to a[n] is the array storing a heap with a maximum size of n.
// count is the current size of the heap.
void HeapPop(int[] a)
{
    if (count == 0)
        return;
    a[1] = a[count];
    --count;
    TopDownHeapifyElement(a, 1, count);
}

Heap creation

To create a heap out of an array, we can simply start with an empty heap and then insert every element of the array to the heap. As we visited before, every insertion involves a bottom-up heapify of the newly inserted element.

  1. From the first element to the last element of an array, apply the bottom-up heapify to each of them.

As another strategy, a heap creation can also be done with the top-down heapify in a reversed direction. Since the top-down heapify is a no-op for leaf nodes of a tree, we only need to apply it to non-leaf nodes.

  1. From the middle element to the first element of an array (in a reversed direction), apply the top-down heapify to each of them. Note that the middle element of an array (e.g., \(a[{n \over 2}]\)) is the right-most non-leaf node of the corresponding complete binary tree.
// a[1] to a[n] is the array storing a heap with a maximum size of n.
// count is the current size of the heap.
void HeapCreate(int[] a, int count)
{
    for (int i = count/2; i > 0; --i)
    {
        TopDownHeapifyElement(a, i, count);
    }
}

The run time of each top-down heapify depends on the height of the node where the heapify is applied. For example, given a heap with \(n\) nodes, the first (top) level has a height of \(\log n\) while the last (leaf) level has a height of zero. As the height decrements, the number of nodes at each level doubles. With that, the heap creation with the top-down heapify strategy can be done in \(O(n)\) time which is calculated as below:

$$ \begin{aligned} T & = ~~~2^0(\log n) + 2^1(\log n - 1) + 2^2(\log n - 2) + \cdots + 2^{\log n - 1} \\ 2T & = ~~~~~~~~~~~~~~~~~~~~~~~ 2^1(\log n) ~~~~~~~ + 2^2(\log n - 1) + \cdots + 2^{\log n -1 }(2) + 2^{\log n} \\ & \\ T & = 2T - T \\ & \\ & = - 2^0(\log n) + 2^1 ~~~~~~~~~~~~~~~~~~ + 2^2 ~~~~~~~~~~~~~~~~~~~ + \cdots + 2^{\log n - 1} ~~~~~ + 2^{\log n} \\ & = - 2^0(\log n) + {2 (1 - 2^{\log n}) \over (1-2)} \\ & = 2n - \log n - 2 \end{aligned} $$

Why a heap is a complete binary tree

It looks to me that one goal of introducing heaps is to help illustrate arrays. For example, the operations we visited above, like the heap insertion, deletion, and the subsequent heapify, are essentially element movements in-place of arrays, but become more clear under the picture of heaps. When talking about arrays, often time they are referred to compact arrays where no empty slots exist between the first and last elements. Recall that a compact array represents a complete binary tree. Accordingly, being a complete binary tree allows a heap to illustrate a compact array.

Contents