# Sorting (3): Special Linear Time Scenarios

Can sorting be even faster while we’ve mentioned in the previous article that the upper bound of sorting is O(nlogn)? The answer is yes! We can do sorting in linear time if the data set meets certain conditions.

## Bucket sort

Bucket sort is a divide and conquer sorting algorithm. Since it allows linear time performance, we discuss it here. The basic idea of bucket sort is to distribute a data set into several buckets and then sort each bucket individually. Buckets are already ordered. Once the data elements in each bucket are sorted, the whole data set then becomes sorted.

Suppose we have a data set ranging from $$V_{min}$$ to $$V_{max}$$. To evenly distribute it into $$n$$ buckets while ensuring that the whole range of data elements is covered, we calculate the following interval for each bucket.

$$Interval_{data} = V_{max}-V_{min}+1$$

$$Interval_{bucket}=\begin{cases} Interval_{data}/n+1 &\text{if } Interval_{data}/n = 0 \\ Interval_{data}/n+1 &\text{if } Interval_{data} \mod n \ne 0 \\ Interval_{data}/n &\text{otherwise}. \end{cases}$$

Sometimes the distribution of data over buckets is all what we want and there is no need for individual bucket sorts. It can be the case where the number of buckets matches exactly the data range. For example, the score of an assignment/exam could range from 0 to 100, for which 101 buckets would be enough to sort all assignments/exams scores by simply distributing them into the buckets.

Another interesting case is to find the maximum gap between two consecutive elements from an unsorted data set:

Suppose the data set has $$n$$ elements and not all of them are identical (or the maximum gap is just zero). We distribute them into $$n$$ buckets that have an equal-size interval based on the above formula. If values of all data elements are consecutive, each bucket will hold exactly one element and thus the maximum gap is one. Otherwise, at least one bucket must have more than one data element. That is, at least one bucket must be empty. Consequently, the maximum gap must be larger than one bucket interval, which is the gap between two non-empty buckets that contain sequential data elements—that is, the gap between the maximum data element of the first bucket and the minimum data element of the second bucket.

For the case where only distribution is needed, bucket sort simply performs in $$O(n)$$ and is stable. When sorting individual buckets is desired, the performance and stability of bucket sort depends on the sorting algorithm applied to each individual bucket.

Since buckets are independent from each other while being ordered by their scope, bucket sort enables a good foundation for parallelism and external sorting.

## Counting sort

Counting sort is a special case of bucket sort where the buckets match the distinct elements of a data set. One case is the score of assignments/exams mentioned before. For example, we have 101 buckets that match the full range of scores from 0 to 100. We then count the appearance of each score from the data set. Once the counting is done, the sorted result can be generated from a sequential scan of buckets from score 0 to 100 based on their repeating times.

Here are some other examples of counting sort:

Counting sort can be done in a stable manner. For that, we maintain an array $$A$$ with the size of the bucket number. It records accumulative counting results from the buckets. For example, suppose the counting result of each bucket is kept in an array $$C$$, $$A[i]$$ then contains the volume accumulated from $$C$$ to $$C[i]$$. We then scan the data set reversely—from the last data element to the first one. For each data element scanned, its position in the final sorted result can be determined by the corresponding value in $$A$$. After that, we decrement that value by one. Note that we are doing a reverse scan of the data set because $$A[i]$$ includes the range from $$C$$ to $$C[i]$$. In an alternative way, where $$A[i]$$ is the accumulation from $$C$$ to $$C[i-1]$$, we will scan the data set from the beginning to the end and then increment the corresponding value in $$A$$ by one once a data element is positioned to the sorted result.

When the number of buckets is not too large compared to the data set size, above processes all run in linear time. Counting sort is thus a linear-time sorting algorithm. Counting sort will be efficient if the number of buckets is much smaller than the data set size.