-
Notifications
You must be signed in to change notification settings - Fork 153
Performance 18254756429: Improve hash grouping aggregation parallelism #2729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Performance 18254756429: Improve hash grouping aggregation parallelism #2729
Conversation
| for (auto idx = 0; idx < num_rows; ++idx) { | ||
| col->set_scalar<int64_t>(idx, idx * 1'000'000'000); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hard-coded constant influences memory allocation; if we decrease it (as shown bellow), the test passes for instance.
diff --git i/cpp/arcticdb/processing/test/test_clause.cpp w/cpp/arcticdb/processing/test/test_clause.cpp
index 3bbe66a00..f9f5add0a 100644
--- i/cpp/arcticdb/processing/test/test_clause.cpp
+++ w/cpp/arcticdb/processing/test/test_clause.cpp
@@ -84,7 +84,7 @@ TEST(Clause, PartitionHashQuality) {
make_scalar_type(DataType::INT64), 0, AllocationType::DYNAMIC, Sparsity::NOT_PERMITTED
);
for (auto idx = 0; idx < num_rows; ++idx) {
- col->set_scalar<int64_t>(idx, idx * 1'000'000'000);
+ col->set_scalar<int64_t>(idx, idx * 100'000);
}
seg.add_column("grouping", col);
seg.set_row_id(num_rows - 1);There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can reproduce with gcc 11.2 from conda-forge (also used for the PyPI builds).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeh I just came to the same conclusion. The loop isn't exiting when idx hits 100.
auto is resolving to int, which is equivalent to int32_t here. Changing the auto to int64_t fixes the issue. Something about the idx * 1'000'000'000 overflowing the max representable value of int32_t is breaking something, which feels like a compiler bug, since both arguments to set_scalar are passed by value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, time to refresh and old but gold quiz.
e018f74 to
f65cb5d
Compare

Reference Issues/PRs
18254756429
What does this implement or fix?
Poor quality hash implementations of integral types, including at least some implementations of
std::hashare basically a static cast. e.g.std::hash<int64_t>{}(100) == 100. This is fast, but leads to poor distributions in our bucketing, where we mod the hash with the number of buckets. In particular, if performing a grouping hash on a timeseries where the time points are dates results in all of the rows being partitioned into bucket zero, which then results in no parallelism in the aggregation clause.Swap to using a consistent hash function across all supported platforms with improved uniformity.