Sorting Algorithm

Today, I’m learning sorting algorithm, and taking some notes in this passage.

Selection Sort

I believe Selection Sort is the easiest sorting algorithm to understand, as it simply involves repeatedly finding the smallest elements in the unsorted portion of the array and swapping it with the first unsorted element.

Let’s take a look at the implementation:

Implementation

/* Section Sort */
void section_sort(int *nums, int numsSize) {
  // Outer loop: i represents the first elements' index of the unsorted portion.
  for(int i = 0; i < numsSize - 1; i++) {
    // Find the index of the smallest element in the unsorted portion.
    int index_min = i;

    for(int j = i + 1; j < numsSize; j++) {
      if(nums[index_min] > nums[j]) { index_min = j; }
    }
    // Swap the found minimum element with the first unsorted element.
    int tmp = nums[index_min];
    nums[index_min] = nums[i];
    nums[i] = tmp;
  }
}

Bubble Sort

Bubble Sort offers a slight improvement over Selection Sort because it’s an adaptive sorting algorithm that can be optimized using a flag.

Implementation

/* Bubble Sort */
void bubble_sort(int *nums, int numsSize) {
  // Outer loop: i represents the index of the last unsorted elements.
  for(int i = numsSize - 1; i > 0; i++) {
    // Inner loop: "Bubble" the largest element in the unsorted portion to its correct position.
    for(int j = 0; j < i; j++) {
      if(nums[j] > nums[j + 1]) {
        int tmp = nums[j];
        nums[j] = nums[j + 1];
        nums[j + 1] = tmp;
      }
    }
  }
}

Efficiency Optimization

We can observe that if no swaps occur during an inner loop iteration, the array is already sorted. This allows us to introduce a flag variable to track whether a swap has taken place. If no swaps occur, we can terminate the algorithm early, improving efficiency.

This optimization could decrease the time complexity to O(n) when the raw array is sorted.

/* Bubble Sort with Flag */
void bubble_sort_with_flag(int *nums, int numsSize) {
  for(int i = numsSize - 1; i > 0; i++) {
    // Set the flag variable.
    bool flag = false;
    for(int j = 0; j < i; j++) {
      if(nums[j] > nums[j + 1]) {
        int tmp = nums[j];
        nums[j] = nums[j + 1];
        nums[j + 1] = tmp;
        // Turns the flag to true if swap occurred.
        flag = true;
      }
    }
    if(!flag) { break; }
  }
}

Notice: Even after the optimization, the max time complexity is still O(n^2).

Insertion Sort

Insertion Sort is a practical and efficient sorting algorithm for small datasets due to its relatively low time complexity in such cases.

Implementation

/* Insertion Sort */
void insertion_sort(int *nums, int numsSize) {
  // Outer loop: i represents the first elements of the unsorted portion.
  for(int i = 1; i < numsSize; i++) {
    // Store nums[i] in a temporary variable base.
    int base = nums[i], j = i - 1;
    // Inner loop: shift the elements forward until encountering the correct position for base.
    while(j >= 0 && nums[j] > base) {
      nums[j + 1] = nums[j];
      j--;
    }
    // Place base to the correct position.
    nums[j + 1] = base;
  }
}

Algorithm Characteristics

Insertion Sort is widely used due to it’s simplicity and efficiency in sorting small or nearly sorted datasets. While its worst-case time complexity is O(n^2), it generally performs better than Bubble Sort because it requires fewer element swaps.

Additionally, Insertion Sort is adaptive, meaning it performs particularly well when the input is already partially sorted. In the best case (when the array is already sorted), its time complexity is O(n), making it more efficient than many other simple sorting algorithms in such scenarios.

So, compared to Bubble Sort, Insertion Sort is often faster because it performs fewer operations in practice.

Quick Sort

As the name suggests, Quick Sort is a highly efficient sorting algorithm that leverages the divide-and-conquer strategy. It is widely used due to its excellent average-case time complexity and adaptability to various scenarios.

Implementation

The implementation of quick sort is more complex than the previous algorithms. Its core operation revolves around pivot partitioning.

Pivot Partitioning

When processing an array, we first choose a pivot element and then rearrange the array such that all elements smaller than the pivot appear before it, while all larger elements appear after it.

/* Pivot Partitioning */
void swap(int *nums, int x, int y) {
  int tmp = nums[x];
  nums[x] = nums[y];
  nums[y] = tmp;
}

// Pivot Partitioning
int partition(int *nums, int left, int right) {
  // Choose the leftmost element as the pivot.
  int l = left, r = right;

  while(l < r) {
    // Notice: Here we must place the loop for r first, or l will be left + 1 when the array have been sorted, triggering the swapping between left and left + 1.
    while(l < r && nums[r] >= nums[left]) { r--; }
    while(l < r && nums[l] <= nums[left]) { l++; }
    swap(nums, l, r);
  }
  // Place left in the correct position.
  swap(nums, left, l);
  return l;
}

Recursive Sorting

The partition function only partitions the array once. To fully sort the array, we must apply partitioning recursively.

/* Quick Sort */
void quick_sort(int *nums, int left, int right) {
  // Base case: Stop when the segment only contains 1 or 0 elements.
  if(left <= right) { return; }

  // Partition the array and obtain the pivot index.
  int pivot = partition(nums, left, right);

  // Recursively sort the left and right sub-arrays.
  quick_sort(nums, left, pivot - 1);
  quick_sort(nums, pivot + 1, right);
}

Pivot Optimization

Although Quick Sort is generally efficient, its performance can degrade under certain conditions. For example, when the input array is already sorted in reverse order and we always select the first element of the array. This results in highly unbalanced partitions, reducing Quick Sort to a worst-case O(n^2) complexity, similar to Bubble Sort.

To mitigate this issue, we could improve the pivot selection. A simple yet effective approach is to choose the median of three commonly used elements:

Leftmost element
Middle element
Rightmost element

This strategy increases the likelihood of selecting a pivot that creates balanced partitions, improving the overall efficiency of the algorithm.

/* Quick Sort (with optimization of pivot selection) */
int select_pivot(int *nums, int left, int mid, int right) {
  int l = nums[left], m = nums[mid], r = nums[right];
  // Return the median of the three values.
  if((l <= m && m <= r) || (r <= m && m <= l)) {
    return mid;
  } else if((m <= l && l <= r) || (r <= l && l <= m)) {
    return l;
  } else {
    return r;
  }
}

/* Optimized Partitioning */
int partition(int *nums, int left, int right) {
  // Choose pivot using the median-of-three strategy
  int pivot = select_pivot(nums, left, left + (right - left) / 2);

  // Move the pivot to the leftmost position.
  swap(nums, left, pivot);

  int l = left, r = right;
  while(l < r) {
    while(l < r && nums[r] >= nums[left]) { r--; }
    while(l < r && nums[l] <= nums[left]) { l++; }
    swap(nums, l, r);
  }
  swap(nums, left, l);
  return pivot;
}

// The other part of the algorithm shouldn't change.

Algorithm Characteristics

Quick Sort remains one of the most efficient and widely used sorting algorithms, particularly due to its O(nlog(n)) average-case time complexity. While its worst-case performance can degrade to O(n^2), careful pivot selection strategy maintain its efficiency in practical scenarios.

Advantages
- fast
- in-place
Disadvantages
- unstable
- time complexity will degrade under worst-case input.

Merge Sort

Merge Sort is also a highly efficient algorithm with a consistent time complexity of O(nlong(n)), making it a reliable choice for large datasets. It operates using the divide-and-conquer strategy, recursively breaking down the array into smaller sub-arrays before merging them back in sa sorted order.

Implementation

The core idea of Merge Sort is to repeatedly divide the input array into two smaller subarrays until each subarray contains only a single element. Then , pairs of subarrays are merged back together in sorted order to reconstruct the fully sorted array.

/* Merge Sort */

// Merge two subarrays into a single sorted array.
void merge(int *nums, int left, int mid, int right) {
  // Create a temporary array to store merged result.
  int tmp_size = right - left + 1;
  int *tmp = (int*)malloc(tmp_size * sizeof(int));

  // Define the pointers for both subarrays.
  int l = left, r = right, index_tmp = 0;

  // Merge elements in ascending order.
  while(l <= mid && r <= right) {
    if(nums[l] <= nums[r]) {
      tmp[tmp_index++] = nums[l++];
    } else {
      tmp[tmp_index++] = nums[r++];
    }
  }
  
  // Append remaining elements from the both subarrays.
  while(l <= mid) {
    tmp[tmp_index++] = nums[l++];
  }
  while(r <= right) {
    tmp[tmp_index++] = nums[r++];
  }

  // Copy sorted elements back to the original array.
  for(int i = 0; i < tmp_size; i++) {
    nums[left + i] = tmp[i];
  }
  free(tmp);
}

// Recursively sorts an array using Merge Sort.
void merge_sort(int *nums, int left, int right) {
  // Base case: Stop when the segment contains 1 or 0 elements.
  if(left >= right) { return; }

  // Compute the middle index.
  int mid = left + (right - left) / 2;

  // Recursively call merge_sort to sort the two subarrays.
  merge_sort(nums, left, mid);
  merge_sort(nums, mid + 1, right);

  // The merging step.
  merge(nums, left, mid, right);
}

Algorithm Characteristics

Merge Sort is widely used when need to handling large datasets efficiently, or when stability is a key requirement.

Advantages:
- Constantly Fast: time complexity is O(nlog(n)) whatever the input is.
- Stable
Disadvantages:
- Extra Space Usage: Merge Sort need a tmp array for instant restore.
- more unit-operation: This feature makes it shows lower efficiency in small-database scenarios.