Sorting Algorithm(II)

1. Divide and Conquer Algorithm

通常解决问题分为三步:1. Divide 把大问题拆解为同样类型的小问题 2. Conquer recursivley解决这些小问题 3. Combine 把之前获得的子问题的解给结合起来。(所以通常a divide-and -conquer algorithm makes multiple recursive calls). 在sorting algorithm里面用到divide and conquer的代表方法有 merge sortquick sort。这两种方法的 running time比前面提到过的Bubble sort,Insertion sort,Selection sort要少。

Merge sort : Worst-Case running time $O(nlgn)$ Average-Case running time $O(nlgn)$ Best-Case running time $O(nlgn)$ Quick sort: Worst-Case running time $O(n^2)$ Average-Case running time $O(nlgn)$ Best-Case running time $O(nlgn )$

2. Merge Sort

  • 如果arr[ ] 中只有一个元素,那就是已经sorted,return

  • Divide takes a recursive approach to divide the problem until no sub-problem. In merge sort, we divide the list recursively into two halves until the sub-array has only 1 element. n ——n/2 ——n/4 —— n/8 Divide only takes $O(1)$ time.

  • merge the smaller lists into new list in sorted order. merge的方式是conquer the sublists together 2 at a time to produce new sorted sublists until all elements have been fully merged into a single sorted array. The merge function runs in $O(n)$ when merging n elements and the merge sort runs in $O(nlg n )$ time.

首先我们来看一下$O(nlg n)$ 是怎么得到的~ 想象一下有一个树🌲从上到下的过程中不断的二分二分二分,我们开始的第一个level有n个元素,最后一个level 每一个都只有一个元素。不过每一个level的元素个数总和都是n。merge sort的总时间等于对merge每一层的时间求和 (sum),假设每一层Merge sort所需要的时间是cn,(c相当于一个constant) ,所以如果我们有L层的话,我们最终可以得到 L*cn 。根据binary tree的性质我们知道 $ L = log_2(n) + 1 $ 所以merge sort的时间是$cn(log_2n)$ ,所以我们得到了$O(nlg n)$然后举个例子来帮助理解Merge Sort.

我们现在有一个未排序的数组 arr[ ] = [14, 33, 27, 10, 35, 19] 首先我们要divide 这个数组。

Left_arr[ ] = [14 33 27] Right_arr[ ] = [10 35 19]

Left_sub_arr1 = [14] Left_sub_arr2 = [33 27] Right_sub_arr1 = [10] Right_sub_arr2 = [35 19] ,

再一次进行divide,最后我们得到的 sub_array = 14\ 33\ 27\10\35\19 现在我们要进行merge.

Left_arr_merge1 = [14] Left_arr_merge2 = [27,33], Right_arr_merge1 = [10] Right_arr_merge2 = [19 35]

Left_arr = [14,27,33] Right = [10 19 35]

Sorted_arr = [10, 14, 19, 27, 33, 35]

附上python实现的代码 🤭

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
def mergeSort(arr):
print("original array is", arr)
if len(arr) > 1:
mid = len(arr) // 2
lefthalf = arr[:mid]
righthalf = arr[mid:]
mergeSort(lefthalf)
mergeSort(righthalf)
i = 0
j = 0
k = 0
while i < len(lefthalf) and j < len(righthalf):
if lefthalf[i] < righthalf[j]:
arr[k] = lefthalf[i]
i = i + 1
else:
arr[k] = righthalf[j]
j = j + 1
k = k + 1

while i < len(lefthalf):
arr[k] = lefthalf[i]
i = i + 1
k = k + 1

while j < len(righthalf):
arr[k] = righthalf[j]
j = j + 1
k = k + 1

print("merging", arr)
  1. Quick Sort

    首先要涉及到Pivot value的选择,因为Pivot value 和partition 有关,partition 又是 quick sort算法实现过程的核心。

    • Always pick first element as pivot {只是为了简单化}
    • Put pivot at its correct position in sorted array and put all smaller elements (smaller than pivot) before pivot, and put all greater elements (greater than pivot) after pivot. {Done in the linear time}

    然后我们来看一下quick sort的伪代码

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
       quickSort( arr[ ], start_idx, end_idx)
    {
    if ( start_idx < end_idx)

    pivot = partition(arr, start_idx, end_idx)

    quickSort(arr, start_idx, pivot - 1)

    qucikSort(arr, pivot + 1, high)

    }

我们再来进行一下time complexity的分析~ its worst-case running time is as bad as selection sort’s and insertion sort’s: $O(n^2 )$ 。参考可汗学院的algorithm教程,我们可以知道worst-case就是the most unbalanced partitions 。如下图所示。

worst

同理,best case occurs when the partitions are evenly balanced as possible. 如下图所示。

1

但是quick sort的average-case running time 是和merge-sort一样的,都是$O(nlg n)$. In practice, quicksorts outperforms merge sort.

  1. Difference between Merge Sort and Quick Sort

    In Merge Sort, divide 几乎没起到什么作用,主要起作用的部分是merge部分。「mergesort」就是在merge部分✅~

    In Quick Sort, merge 几乎没起到什么作用,主要起作用的部分是divide部分。「和pivot比较大小,从而分割」

---------------------- 本文结束----------------------