feat(collection): Replace quickSort with pdqsort for performance and robustness #922

abbashosseinii · 2025-11-04T21:32:46Z

This PR enhances the performance and robustness of the default unstable sort in package:collection by replacing the existing quickSort implementation with a modern, state-of-the-art Pattern-Defeating Quicksort (pdqsort) algorithm. The implementation is a Dart port of the concepts from the canonical C++ version by Orson Peters.

Motivation

The pdqsort algorithm is the default unstable sort in other high-performance ecosystems (like Rust) for good reason. It offers two key advantages over the existing implementation:

Superior Performance on Common Patterns: Through more intelligent pivot selection (median-of-three/ninther), it consistently outperforms the random-pivot strategy on a wide variety of real-world data patterns.
Guaranteed Worst-Case Performance: It includes a heapsort fallback mechanism that activates on pathological inputs, providing a hard guarantee of O(n log n) performance and preventing worst-case O(n²) behavior.

This change is a pure, non-breaking enhancement that makes the standard library sort faster and safer with no API changes or regressions.

Benchmark Results

To validate this enhancement, a comprehensive benchmark was created, comparing the new pdqsort implementation against a direct copy of the existing SDK quickSort across five critical data patterns.

The results demonstrate a consistent and significant performance improvement across all tested scenarios, with no regressions.

Benchmark Results:
pdqsort (Enhancement) vs. SDK quickSort (Baseline)

Data Condition	Baseline (µs)	Enhancement (µs)	Improvement	Winner
Mean
Random	7584	6654	+12.27%	Enhancement 🚀
Sorted	7515	7207	+4.10%	Enhancement 🚀
Reverse Sorted	7598	7341	+3.38%	Enhancement 🚀
Few Unique	1741	1560	+10.40%	Enhancement 🚀
Pathological	8564	8034	+6.19%	Enhancement 🚀
Median
Random	7612	6672	+12.35%	Enhancement 🚀
Sorted	7596	7220	+4.95%	Enhancement 🚀
Reverse Sorted	7598	7363	+3.09%	Enhancement 🚀
Few Unique	1712	1567	+8.47%	Enhancement 🚀
Pathological	8537	8063	+5.55%	Enhancement 🚀

(Results are the average/median time in microseconds to sort 50,000 integers, aggregated over 12 runs.)

I’ve reviewed the contributor guide and applied the relevant portions to this PR.

Contribution guidelines:

See our contributor guide for general expectations for PRs.
Larger or significant changes should be discussed in an issue before creating a PR.
Contributions to our repos should follow the Dart style guide and use dart format.
Most changes should add an entry to the changelog and may need to rev the pubspec package version.
Changes to packages require corresponding tests.

Many Dart repos have a weekly cadence for reviewing PRs - please allow for some latency before initial review feedback.

Note: The Dart team is trialing Gemini Code Assist. Don't take its comments as final Dart team feedback. Use the suggestions if they're helpful; otherwise, wait for a human reviewer.

…robustness

This commit introduces a comprehensive benchmark harness to provide empirical evidence for the proposal to replace the existing quickSort implementation with a more performant and robust pdqsort algorithm. The harness evaluates performance across five critical data patterns to ensure a thorough comparison: - **Random:** Tests performance on typical, unsorted data. - **Sorted & Reverse Sorted:** Tests for handling of common presorted patterns. - **Few Unique:** Tests efficiency on data with high duplication. - **Pathological:** Tests robustness against inputs designed to cause worst-case behavior in quicksorts. The baseline implementation (`quickSortBaseline`) is a direct copy of the existing SDK code to ensure the comparison is fair and accurate. The results from this benchmark will be used to justify the enhancement in the accompanying pull request.

github-actions · 2025-11-05T01:40:58Z

PR Health

License Headers ✔️

// Copyright (c) 2025, the Dart project authors. Please see the AUTHORS file
// for details. All rights reserved. Use of this source code is governed by a
// BSD-style license that can be found in the LICENSE file.

Files
no missing headers

All source files should start with a license header.

This check can be disabled by tagging the PR with skip-license-check.

API leaks ✔️

The following packages contain symbols visible in the public API, but not exported by the library. Export these symbols or remove them from your publicly visible API.

Package	Leaked API symbol	Leaking sources

This check can be disabled by tagging the PR with skip-leaking-check.

Breaking changes

⚠️

Package	Change	Current Version	New Version	Needed Version	Looking good?
collection	Breaking	1.19.1	1.20.0-wip	2.0.0 Got "1.20.0-wip" expected >= "2.0.0" (breaking changes)	⚠️

This check can be disabled by tagging the PR with skip-breaking-check.

Coverage

⚠️

File	Coverage
pkgs/collection/benchmark/dataset_generator.dart	💔 Not covered
pkgs/collection/benchmark/sort_benchmark.dart	💔 Not covered
pkgs/collection/lib/src/algorithms.dart	💔 90 % ⬇️ 9 %

This check for test coverage is informational (issues shown here will not fail the PR).

This check can be disabled by tagging the PR with skip-coverage-check.

Changelog Entry ✔️

Package	Changed Files

Changes to files need to be accounted for in their respective changelogs.

This check can be disabled by tagging the PR with skip-changelog-check.

This addresses the failing 'Changelog Entry' CI check by adding the required entry for the quickSort implementation replacement.

lrhn · 2025-11-05T13:58:35Z

Looks pretty convincing. I'll have to look at it a little more to see if it's so universally better that we want to replace the simple quicksort, or if we may want to have both.

I don't know how many people actually use the sort functions in package:collection instead of just using List.sort.
(So should we change List.sort?)

The only nit I noticed in a quick read-through is that I'd avoid calling keyOf more than once per element when it can be avoided. The keyOf operation may not be trivial.

@natebosch WDYT?

pkgs/collection/benchmark/sort_comparison_benchmark.dart

pkgs/collection/lib/src/algorithms.dart

abbashosseinii · 2025-11-05T16:28:16Z

Hi @lrn, thank you for the thoughtful review and feedback! This is a great discussion. Let me address your points directly.

...see if it's so universally better that we want to replace the simple quicksort, or if we may want to have both.

That's the key question. I believe replacing it is the right move because pdqsort is designed to be a "drop-in" replacement that is strictly better than the existing quickSort.

No Regressions: As the benchmarks show, it offers a performance improvement over the baseline quickSort on every tested data pattern. It doesn't trade performance in one area for another.
Safety Net: The heapsort fallback provides a hard O(n log n) guarantee, which the current implementation lacks. This makes the library sort more robust.

For these reasons, I don't see a scenario where a user would knowingly choose the older, less robust implementation, which argues against keeping both.

I don't know how many people actually use the sort functions in package:collection instead of just using List.sort. (So should we change List.sort?)

This was an excellent question, and it prompted me to add List.sort to the benchmark to get a clear answer. The results are fascinating and, I believe, make a strong case for this PR.

Benchmark Results: pdqsort (Enhancement) vs. SDK quickSort (Baseline) vs. List.sort

Data Condition	Baseline (µs)	Enhancement (µs)	List.sort (µs)	Enh vs Base	List.sort vs Base	Best
Mean
Random	7468	6546	6588	+12.34%	+11.78%	Enhancement 🚀
Sorted	7344	7087	2364	+3.51%	+67.81%	List.sort
Reverse Sorted	7351	7138	2769	+2.90%	+62.33%	List.sort
Few Unique	1750	1567	2314	+10.44%	-32.26%	Enhancement 🚀
Pathological	8531	8026	4971	+5.92%	+41.73%	List.sort
Median
Random	7566	6576	6588	+13.08%	+12.93%	Enhancement 🚀
Sorted	7353	7055	2383	+4.05%	+67.59%	List.sort
Reverse Sorted	7325	7154	2776	+2.33%	+62.10%	List.sort
Few Unique	1738	1567	2316	+9.84%	-33.26%	Enhancement 🚀
Pathological	8562	8018	4992	+6.35%	+41.70%	List.sort

Native Code Wins on Structure: As expected, the VM's native C++ List.sort is unbeatable on structured data (sorted, reverse-sorted). It likely has low-level optimizations to detect sorted runs that a pure-Dart implementation can't match.
This pdqsort Wins on Chaos: Most importantly, this new pdqsort implementation is the fastest of all three on chaotic data—both random data and data with few unique values.

This shows that the goal of this PR isn't to replace List.sort, but to make the pure-Dart implementation in package:collection the best it can possibly be. By doing so, we've achieved a sort that is so efficient it can even surpass the native VM's implementation for common "messy" data scenarios.

The only nit I noticed in a quick read-through is that I'd avoid calling keyOf more than once per element when it can be avoided. The keyOf operation may not be trivial.

Excellent point, and thank you for catching that! You are absolutely right. I've just pushed a new commit that caches the keys for the pivot and the currently inspected element in the partitioning and insertion sort loops. This will avoid the redundant calls and improve performance when keyOf is an expensive operation.

Thank you again for the review! Let me know if you have any other questions.

This commit refactors the _pdqSiftDown helper function, which is used as part of the heapsort fallback mechanism within the quickSort implementation. The changes improve both performance and readability: 1- Key Caching: The key of the current largest element is now cached in a local variable (largestKey). This avoids multiple redundant calls to the keyOf function within the loop, reducing overhead. 2- Clearer Logic: The comparison logic has been restructured. Instead of potentially re-evaluating keyOf(elements[start + largest]) after the first comparison, the cached key is used, and the flow for comparing the root with its left and right children is now more explicit and easier to follow. 3- Early Exit: An early break is introduced for leaf nodes (when left >= n), which slightly simplifies the control flow. These changes do not alter the algorithm's behavior but make the implementation more efficient and maintainable.

Updated the comment for the BenchmarkResult class to enhance clarity by rephrasing it from "Represents the final aggregated result of a benchmark" to "The final aggregated result of a benchmark." This change aims to improve the readability of the code documentation.

…ckSort and pdqsort This commit introduces a new dataset generator for creating various data patterns used in benchmarking sorting algorithms. It also implements a comprehensive benchmark harness that compares the performance of the baseline quickSort and the enhanced pdqsort across multiple data conditions, including random, sorted, reverse sorted, few unique, and pathological datasets. The results will help evaluate the effectiveness of the pdqsort enhancement.

pkgs/collection/lib/src/algorithms.dart

pkgs/collection/benchmark/sort_benchmark.dart

lrhn · 2025-11-11T14:03:14Z

I think this looks fine, so LGTM from me.

@natebosch WDYT?

pkgs/collection/benchmark/dataset_generator.dart

lrhn · 2025-11-12T10:20:37Z

pkgs/collection/benchmark/sort_benchmark.dart

+
+  List<int> getNextList() {
+    // Cloning the list is crucial so each run sorts an unsorted list.
+    return List<int>.from(datasets[_iteration++ % datasets.length]);


If you clone the list content anyway, there is no need to have 128 different instances of the same list in the sorted/reversed sets. One should be enough.

(This cloning takes time. It's probably unavoidable, we'd have to allocate a lot of lists ahead of time to avoid that, and the memory pressure starts mattering. It should be the same time for all the benchmarks, so at least it's a consistent overhead.

It should use .toList(), not List.from. That's likely to be more performant. The list knows its own length and the fastest way to copy its elements.

I'd not use % with a variable value. That's also expensive. Probably not remotely measurable compared to sorting 50000 elements, though. Still,
could be:

var dataset = datasets[_iteration]; iteration++; if (iteration == datasets.length) iteration = 0; return dataset.toList();

)

lrhn · 2025-11-12T10:29:01Z

pkgs/collection/benchmark/sort_benchmark.dart

+/// A base class for our sort benchmarks to reduce boilerplate.
+/// Note: We extend `BenchmarkBase` for its structure but will use our own
+/// timing.
+abstract class SortBenchmarkBase extends BenchmarkBase {


You can generate the dataset in a setUp function instead of creating it ahead of time.

That would also allow doing benchmarks with varying sizes of inputs, either all small, all large, or mixed.

natebosch

Overall change looks good. Added a few nitpicks that should be easy to address, I'll handle any others in a followup.

pkgs/collection/benchmark/sort_benchmark.dart

pkgs/collection/lib/src/algorithms.dart

Caches elements and keys in local variables within `_pdqSort3` to reduce `keyOf` calls from a potential 6 down to 3, and to minimize list access during pivot selection.

…llow Effective Dart I removed the leading "get" from method names and updated call sites so the API doesn't start method names with "get". This change follows the Effective Dart guideline: avoid starting a method name with get — see https://dart.dev/effective-dart/design#avoid-starting-a-method-name-with-get

Co-authored-by: Nate Bosch <[email protected]>

…range of values

Refine the list creation to be more explicit and performant. - Use a more descriptive name `sortedBase` instead of `base`. - Create the initial `sortedBase` list with `growable: false` as it is never modified. - Use `List.of` and explicitly set `growable: true` for the final copies to make the intent clear that they are mutable.

Make the list generation in the `_generateReverse` helper more explicit and performant. - The `base` list is now created with `growable: false`, as it's a read-only template. - The final copies are created using `List.of` with an explicit `growable: true` to clearly signal that they are mutable.

Make the list generation in the `_generateSorted` helper more explicit and performant. - The `base` list is now created with `growable: false` as it's a read-only template. - The final copies are created using `List.of` with an explicit `growable: true` to clearly signal that they are mutable.

Move the generation of `sorted`, `reverse`, and `pathological` base data into top-level `final` variables. - **Efficiency:** Ensures these lists are computed only once at startup. `_generatePathological` now correctly reuses `_sortedBase` instead of re-calculating it. - **Clarity:** Distinguishes between the immutable data templates and the mutable copies used for testing. - **Consistency:** Simplifies the deterministic generator functions into concise one-liners.

Refactor the loops that build `_pathologicalBase` to use `i += 2` for a minor performance and clarity improvement.

Improves code readability.

- Fix a bug in the `_pathologicalBase` creation logic for odd sizes. - Make `_reversedBase` generation more declarative using `.reversed.toList()`. - Add clarifying comments for complex calculations.

abbashosseinii added 2 commits November 5, 2025 00:30

feat(collection): Replace quickSort with pdqsort for performance and …

ebe5aba

…robustness

abbashosseinii requested a review from a team as a code owner November 4, 2025 21:32

github-actions bot added the package:collection label Nov 4, 2025

Merge branch 'main' into feature/replace-quicksort-with-pdqsort

63fcfac

docs(collection): add changelog for quickSort pdqsort enhancement

307ec48

This addresses the failing 'Changelog Entry' CI check by adding the required entry for the quickSort implementation replacement.

lrhn reviewed Nov 5, 2025

View reviewed changes

pkgs/collection/benchmark/sort_comparison_benchmark.dart Outdated Show resolved Hide resolved

lrhn reviewed Nov 5, 2025

View reviewed changes

pkgs/collection/benchmark/sort_comparison_benchmark.dart Outdated Show resolved Hide resolved

lrhn reviewed Nov 5, 2025

View reviewed changes

pkgs/collection/lib/src/algorithms.dart Show resolved Hide resolved

abbashosseinii added 3 commits November 5, 2025 20:42

lrhn reviewed Nov 11, 2025

View reviewed changes

pkgs/collection/lib/src/algorithms.dart Outdated Show resolved Hide resolved

lrhn reviewed Nov 11, 2025

View reviewed changes

pkgs/collection/lib/src/algorithms.dart Outdated Show resolved Hide resolved

lrhn reviewed Nov 11, 2025

View reviewed changes

pkgs/collection/benchmark/sort_benchmark.dart Outdated Show resolved Hide resolved

lrhn requested a review from natebosch November 11, 2025 14:03