11 Sep 12:35 2009

## Replacement of Quicksort in java.util.Arrays with new Dual-Pivot Quicksort

Hello All, I'd like to share with you new Dual-Pivot Quicksort which is faster than the known implementations (theoretically and experimental). I'd like to propose to replace the JDK's Quicksort implementation by new one. Description ----------- The classical Quicksort algorithm uses the following scheme: 1. Pick an element P, called a pivot, from the array. 2. Reorder the array so that all elements, which are less than the pivot, come before the pivot and all elements greater than the pivot come after it (equal values can go either way). After this partitioning, the pivot element is in its final position. 3. Recursively sort the sub-array of lesser elements and the sub-array of greater elements. The invariant of classical Quicksort is: [ <= p | >= p ] There are several modifications of the schema: [ < p | = p | > p ] or [ = p | < p | > p | = p ] But all of them use*one*pivot. The new Dual-Pivot Quicksort uses*two*pivots elements in this manner: 1. Pick an elements P1, P2, called pivots from the array. 2. Assume that P1 <= P2, otherwise swap it. 3. Reorder the array into three parts: those less than the smaller pivot, those larger than the larger pivot, and in between are those elements between (or equal to) the two pivots. 4. Recursively sort the sub-arrays. The invariant of the Dual-Pivot Quicksort is: [ < P1 | P1 <= & <= P2 } > P2 ] The new Quicksort is faster than current implementation of Quicksort in JDK (L. Bentley and M. Douglas McIlroy) and classical Quicksort. The full description of the Dual-Pivot Quicksort you can find on my page: http://iaroslavski.narod.ru/quicksort Performance tests ----------------- Here is result of running on different types of input data: Client VM all 85% organ 0..1 0..4 random ascend descend equal equal pipes random 010101 random Dual-Pivot: 16.83 5.31 5.47 0.35 0.68 10.59 1.06 1.02 2.18 Bentley's: 19.77 9.08 10.13 0.63 1.12 13.22 1.63 1.08 2.49 Server VM all 85% organ 0..1 0..4 random ascend descend equal equal pipes random 010101 random Dual-Pivot: 23.94 6.68 6.63 0.43 0.62 17.14 1.42 1.96 3.41 Bentley's: 25.20 10.18 10.32 2.07 1.33 16.72 2.95 1.82 3.39 The a lot of other tests have been run under client and server mode. The most interesting is BentleyBasher test framework. It runs battery of tests for all cases: { 100, 1000, 10000, 1000000 } x { sawtooth, rand, stagger, plateau, shuffle } x { ident, reverse, reverse_front, reverse_back, sort, dither} where 100, ... , 1000000 - array length sawtooth: x[i] =i%m rand: x[i] = rand() % m stagger: x[i] = (i*m + i) % n plateau: x[i] = min(i, m) shuffle: x[i] = rand()%m? (j+=2): (k+=2) ident(x) - a copy of x reverse(x, 0, n) - reversed copy reverse_front(x, 0, n/2) - front half reversed reverse_back(x, n/2, n) - back half reversed sort(x) - an ordered copy dither(x) - add i%5 to x[i] Here is the result of execution: Server VM: http://spreadsheets.google.com/pub?key=t_EAWUkQ4mD3BIbOv8Fa-AQ&output=html Client VM: http://spreadsheets.google.com/pub?key=tdiMo8xleTxd23nKUObcz0Q&single=true&gid=0&output=html Mathematical investigations --------------------------- It is proved that for the Dual-Pivot Quicksort the average number of comparisons is 2*n*ln(n), the average number of swaps is 0.8*n*ln(n), whereas classical Quicksort algorithm has 2*n*ln(n) and 1*n*ln(n) respectively. Full mathematical proof see in attached proof.txt and proof_add.txt files. Theoretical results are also confirmed by experimental counting of the operations. Diff between current and new implementation of Quicksort -------------------------------------------------------- Here is the link to the diff for java.util.Arrays class: http://cr.openjdk.java.net/~alanb/6880672/webrev.00 If you like to look and play with new algorithm, please, take attached class DualPivotQuicksort.java Feedback -------- Also I'd like to share a feedback from Joshua Bloch and Jon Bentley who spent a lot of time investigating this algorithm, who gave me many advices and tips how to make new Quicksort better. -------- Original Message -------- Subject: Re: Integration of new Dual-Pivot Quicksort into JDK 7 Date: Thu, 10 Sep 2009 07:20:11 -0700 From: Joshua Bloch <jjb@...> Jon also says that Vladimir should make every reasonable improvement to the basic method before checking in the code. In his words, "It would be horrible to put the new code into the library, and then have someone else come along and speed it up by another 20% by using standard techniques." I believe it's not unlikely that this code may end up getting ported to many languages and widely deployed in much the manner of Bentley and McIlroy's fine sort (which is nearing 20 successful years in the field). Jon will help Vladimir do this. -------- Original Message -------- Subject: Dual-Pivot Quicksort: Next Steps Date: Wed, 09 Sep 2009 15:02:25 -0400 From: Jon Bentley <jbentley@...> Vladimir, Josh, I*finally*feel like I understand what is going on. Now that I (think that) I see it, it seems straightforward and obvious. Tony Hoare developed Quicksort in the early 1960s. I was very proud to make minor contributions to a particularly clean (binary) quicksort in the mid 1980s, to a relatively straightforward, industrial strength Quicksort with McIlroy in the early 1990s, and then to algorithms and data structures for strings with Sedgewick in the mid 1990s. I think that Vladimir's contributions to Quicksort go way beyond anything that I've ever done, and rank up there with Hoare's original design and Sedgewick's analysis. I feel so privileged to play a very, very minor role in helping Vladimir with the most excellent work! ----------------------------------------------- Let me know, if you have any questions/comments. Thank you, Vladimir

Here is the details between steps (1) and (2) in the case of Dual-Pivot Quicksort: ----------------------------------------------------------------------------- >From the algorithm above, the average number of comparisons C_n as a function of the number of elements may be represented by the equation: (1) C_n = 1 + 2/(n*(n-1)) * sum_{i=0}^{n-2} sum_{j=i+1}^{n-1} {C_i + 1*i + C_{j-i-1} + 2*(j-~~i~~- 1) + C_{n-j-1} + 2*(n-j-1)} Equation (1) means that total number is the sum of the comparison numbers of all cases of partitions into 3 parts plus number of comparisons for elements from left part (one comparison), center and right parts (2 comparisons). !!! It can be rewritten in other way: !!! (2) C_n = 1 + R*(n-2) + 2/(n(n-1)) * sum_{i=0}^{n-2} sum_{j=i+1}^{n-1} {C_i + C_{j-i-1} + C_{n-j-1}} where R is the average number of comparisons during one iteration. This constant R can be found as average value (1 + 2 + 2) / 3 = 5/3 ~ 1.6666. ----------------------------------------------------------------------------- How it can be rewritten: We should show that 2/(n*(n-1)) * sum_{i=0}^{n-2} sum_{j=i+1}^{n-1} {i + 2*(j-~~i~~- 1) + 2*(n-j-1)} equals to 5/3 * (n-2). Let's consider the double sum: sum_{i=0}^{n-2} sum_{j=i+1}^{n-1} {i + 2*(j-~~i~~- 1) + 2*(n-j-1)} = = sum_{i=0}^{n-2} sum_{j=i+1}^{n-1} {2*n - 4} - sum_{i=0}^{n-2} sum_{j=i+1}^{n-1} {i} = = 2*(n-2)*sum_{i=0}^{n-2} {n-1-i} - sum_{i=0}^{n-2} sum_{j=i+1}^{n-1} {i} = = 2*(n-2)*((n-1)^2 - (n-1)*(n-2)/2) - (n-1)*sum_{i=0}^{n-2} {i} + sum_{i=0}^{n-2} {i*i} = ------------------------------------------------------------------ here we use the property: sum_{k=1}{n} {k^2} = n^3/3 + n^2/2 + n/6 ------------------------------------------------------------------ = 2*(n-2)*((n-1)^2 - (n-1)*(n-2)/2) - (n-1)*(n-1)*(n-2)/2 + (n-2)^3/3 + (n-2)^2/2 + (n-2)/6 = = 1/6 * (12*(n-1)^2*(n-2) - 6*(n-1)*(n-2)^2 - 3*(n-1)^2*(n-2) + 2*(n-2)^3 + 3*(n-2)^2 + (n-2)) = = 1/6 * (3*(n-1)*(n-2)*(3*(n-1) - 2*(n-2)) + (n-2)*(2*(n-2)^2 + 3*(n-2) + 1))) = = 1/6 * (3*(n-1)*(n-2)*(n+1) + (n-2)*(2*n^2 - 5*n + 3)) = = 1/6 * (n-2)*(5*n^2 - 5*n) = 5/6 * n*(n-1)*(n-2) Substitute the result into equation (1): (1.1) C_n = 1 + 2/(n*(n-1)) * 5/6 * n*(n-1)*(n-2) + 2/(n*(n-1)) * sum_{i=0}^{n-2} sum_{j=i+1}^{n-1} {C_i + C_{j-i-1} + C_{n-j-1}} or (1.2) C_n = 1 + 5/3*(n-2) + 2/(n*(n-1)) * sum_{i=0}^{n-2} sum_{j=i+1}^{n-1} {C_i + C_{j-i-1} + C_{n-j-1}} We see that (1.2) is same as (2).

/*** <at> author Vladimir Yaroslavskiy * <at> version 2009.09.10 m765 */ public class DualPivotQuicksort { public static void sort(int[] a) { sort(a, 0, a.length); } public static void sort(int[] a, int fromIndex, int toIndex) { rangeCheck(a.length, fromIndex, toIndex); dualPivotQuicksort(a, fromIndex, toIndex - 1, 3); } private static void rangeCheck(int length, int fromIndex, int toIndex) { if (fromIndex > toIndex) { throw new IllegalArgumentException("fromIndex(" + fromIndex + ") > toIndex(" + toIndex + ")"); } if (fromIndex < 0) { throw new ArrayIndexOutOfBoundsException(fromIndex); } if (toIndex > length) { throw new ArrayIndexOutOfBoundsException(toIndex); } } private static void swap(int[] a, int i, int j) { int temp = a[i]; a[i] = a[j]; a[j] = temp; } private static void dualPivotQuicksort(int[] a, int left, int right, int div) { int len = right - left; if (len < 27) { // insertion sort for tiny array for (int i = left + 1; i <= right; i++) { for (int j = i; j > left && a[j] < a[j - 1]; j--) { swap(a, j, j - 1); } } return; } int third = len / div; // "medians" int m1 = left + third; int m2 = right - third; if (m1 <= left) { m1 = left + 1; } if (m2 >= right) { m2 = right - 1; } if (a[m1] < a[m2]) { swap(a, m1, left); swap(a, m2, right); } else { swap(a, m1, right); swap(a, m2, left); } // pivots int pivot1 = a[left]; int pivot2 = a[right]; // pointers int less = left + 1; int great = right - 1; // sorting for (int k = less; k <= great; k++) { if (a[k] < pivot1) { swap(a, k, less++); } else if (a[k] > pivot2) { while (k < great && a[great] > pivot2) { great--; } swap(a, k, great--); if (a[k] < pivot1) { swap(a, k, less++); } } } // swaps int dist = great - less; if (dist < 13) { div++; } swap(a, less - 1, left); swap(a, great + 1, right); // subarrays dualPivotQuicksort(a, left, less - 2, div); dualPivotQuicksort(a, great + 2, right, div); // equal elements if (dist > len - 13 && pivot1 != pivot2) { for (int k = less; k <= great; k++) { if (a[k] == pivot1) { swap(a, k, less++); } else if (a[k] == pivot2) { swap(a, k, great--); if (a[k] == pivot1) { swap(a, k, less++); } } } } // subarray if (pivot1 < pivot2) { dualPivotQuicksort(a, less, great, div); } } }

At first consider the classic Quicksort scheme and find the average number of comparisons and swaps for it. We assume that input data is random permutation of n numbers from the range [1..n]. Classic Quicksort ================= 1. Choose a pivot element (take random), 2. Compare each (n-1) elements with the pivot 3. and swap it, if necessary, to have the partitions: [ <= pivot | >= pivot ] 4. Sort recursively left and right parts. >From the algorithm above, the average number of comparisons C_n as a function of the number of elements may be represented by the equation: (1) C_n = (n-1) + 1/n * sum_{k=0}^{n-1} {C_k + C_{n-k-1}} and last sum can be rewritten: (2) C_n = (n-1) + 2/n * sum_{k=0}^{n-1} C_k Write formula (2) for n+1: (3) C_{n+1} = n + 2/(n+1) * sum_{k=0}^{n} C_k Multiply (2) by n and (3) by (n+1) and subtract one from other, we have: (4) (n+1)*C_{n+1} - n*C_n = 2*n + 2*C_n Sorting an array of n elements may be considered as selecting one permutation of the n elements among all possible permutations. The number of possible permutations of n elements is n!, so the task for any sorting algorithm is to determine the one permutation out of n! possibilities. The minimum number of operations (swap and comparisons) for sorting n elements is const*ln(n!). >From the Stirling's formula the approximation of the number of operations is A*n*ln(n) + B*n + C, where A, B and C are constant coefficients. The coefficients B and C are not important for large n. Therefore, the function C_n may be approximated by the equation: (5) C_n = A*n*ln(n) The function C_n is substituted from equation (5) into equation (4), which yields the following equation: (6) (n+1)*A*(n+1)*ln(n+1) - n*A*n*ln(n) = 2*n + 2*A*n*ln(n) Using the properties of logarithms, equation (6) can then be reduced to: (7) n*ln(1+1/n) + 2*ln(1+1/n) + (1/n) * ln(n+1) = 2/A Using a property of logarithm: ln(1 + x) -> x, if x -> 0, and other property: ln(n) / n -> 0, when n -> +oo, equation (7) will be approximated by: (8) 1 + 0 + 0 = 2/A So, the coefficient A is equal to 2 and the average number of comparisons in sorting of n size arrays is (9) C_n = 2*n*ln(n). To find the approximation of the average number of swaps, we use the similar approach as in the case of comparisons. The average number of swaps S_n as a function of the number of elements may be represented by the equation: (10) S_n = 1/2*(n-1) + 2/n * sum_{k=0}^{n-1} S_k We assume that average number of swaps during one iteration is 1/2*(n-1). It means that in average one half of elements is swapped only. Using the same approach, we find that the coefficient A equals to 1. Therefore, the function S_n may be approximated by the equation: (11) S_n = n*ln(n) ------------------------------------------------------------------------------- Now consider the Dual-Pivot Quicksort scheme and find the average number of comparisons and swaps for it. We assume that input data is random permutation of n numbers from the range [1..n]. Dual-Pivot Quicksort ==================== 1. Choose 2 pivot elements pivot1 and pivot2 (take random), 2. pivot1 must be less or equal than pivot2, otherwise they are swapped 3. Compare each (n-2) elements with the pivots 4. and swap it, if necessary, to have the partitions: [ <= p1 | p1 <= & <= p2 | >= p2 ] 5. Sort recursively left, center and right parts. >From the algorithm above, the average number of comparisons C_n as a function of the number of elements may be represented by the equation: (1) C_n = 1 + 2/(n*(n-1)) * sum_{i=0}^{n-2} sum_{j=i+1}^{n-1} {C_i + 1*i + C_{j-i-1} + 2*(j-i-1) + C_{n-j-1} + 2*(n-j-1)} Equation (1) means that total number is the sum of the comparison numbers of all cases of partitions into 3 parts plus number of comparisons for elements from left part (one comparison), center and right parts (2 comparisons). It can be rewritten in other way: (2) C_n = 1 + R*(n-2) + 2/(n(n-1)) * sum_{i=0}^{n-2} sum_{j=i+1}^{n-1} {C_i + C_{j-i-1} + C_{n-j-1}} where R is the average number of comparisons during one iteration. This constant R can be found as average value (1 + 2 + 2) / 3 = 5/3, which means that elements from left part are required only one comparison, and elements form center and right parts - two comparisons. The double sum in equation (2) can be reduced: (3) C_n = 1 + 5/3*(n-2) + 2/(n*(n-1)) * sum_{k=0}^{n-2} {3 * (n-k-1) * C_k} Denote 1 + 5/3*(n-2) by f_n and multiply to n*(n-1): (4) n*(n-1)*C_n = n*(n-1)*f_n + 6 * sum_{k=0}^{n-2} {(n-k-1) * C_k} Write formula (4) for n+1: (5) n*(n+1)*C_{n+1} = n*(n+1)*f_n + 6 * sum_{k=0}^{n-1} {(n-k) * C_k} Subtract (4) from (5), we have: (6) n*(n+1)*C_{n+1} - n*(n-1)*C_n = n*(n+1)*f_n - n*(n-1)*f_n + 6 * sum_{k=0}^{n-2} C_k + 6*C_{n-1} Denote n*(n+1)*C_{n+1} - n*(n-1)*C_n by X_n and n*(n+1)*f_n - n*(n-1)*f_n by F_n: (7) X_n = F_n + 6 * sum_{k=0}^{n-2} C_k + 6*C_{n-1} Write formula (7) for n+1: (8) X_{n+1} = F_{n+1} + 6 * sum_{k=0}^{n-1} C_k + 6*C_n Subtract (7) from (8), we have: (9) X_{n+1} - X_n = F_{n+1} - F_n + 6*C_n Resolving of F_{n+1} - F_n gives: (10) X_{n+1} - X_n = 2 + 10*n + 6*C_n The function X_n is substituted into equation (10), which yields the following equation: (11) (n+1)*(n+2)C_{n+2} -2*n(n+1)*C_{n+1} + (n*(n-1) - 6)*C_n = 2 + 10*n We will find the function C_n approximated by the equation: (12) C_n = A*n*ln(n) The function C_n is substituted from equation (12) into equation (11), which yields the following equation: (13) (n^3+5*n^2+8*n+4)*ln(n+2) - (2*n^3+4*n^2+2*n)*ln(n+1) + (n^3-n^2-6*n)*ln(n) = (10*n+2) / A Using the properties of logarithms, equation (13) can then be reduced to: (14) n^3*(ln(n+2)-2*ln(n+1)+ln(n)) + n^2*(5*ln(n+2)-4*ln(n+1)-ln(n)) + n*(8*ln(n+2)-2*ln(n+1)-6*ln(n)) + 4*ln(n+2) = (10*n+2) / A Using a property of logarithm: ln(1 + x) -> x, if x -> 0, and other property: ln(n) / n -> 0, when n -> +oo, equation (15) will be approximated by: (15) -1 + 4 + 2 + 0 + 0 = 10 / A So, the coefficient A is equal to 2 and the average number of comparisons in sorting of n size arrays is (9) C_n = 2*n*ln(n). To find the approximation of the average number of swaps, we use the similar approach as in the case of comparisons. The average number of swaps S_n as a function of the number of elements may be represented by the equation: (10) S_n = 4 + 2/3*(n-2) + 2/(n*(n-1)) * sum_{k=0}^{n-2} {(n-k-1)*S_k} We assume that average number of swaps during one iteration is 2/3*(n-2). It means that in average one third of elements is swapped only. Using the same approach, we find that the coefficient A equals to 0.8. Therefore, the function S_n may be approximated by the equation: (11) S_n = 0.8*n*ln(n) ------------------------------------------------------------------------- And as summary: The value of the coefficient A: dual-pivot classic comparison: 2.0 2.0 swap: 0.8 1.0