-
Notifications
You must be signed in to change notification settings - Fork 318
Add HashTable methods related to the raw bucket index
#657
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
I'm still willing to change if other names are preferred. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that get_bucket and get_bucket_mut are good, but buckets should be renamed to num_buckets to make it clear it's returning a count and not an iterator.
In terms of find_bucket_index and get_bucket_entry, I left some comments. But if you wanted to merge the smaller set first, I'm fine with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, not required at all, but since a lot of code using the HashTable API is going to be unsafe anyway, it might be worth adding get_bucket_unchecked and get_bucket_unchecked_mut as well, which don't return Option.
|
Does |
I personally think
Ooh, I like that! We couldn't support the full |
|
Calling out this additional benefit of the
|
|
re: unsafe methods, I guess the main concern is whether the bounds checks can easily be optimised out. While there isn't the same robust codegen testing as the compiler here, maybe it would be worth poking around on godbolt and seeing how |
|
I've added those two unchecked methods.
Using + inc rdi
+ cmp r10, rdi
+ jae .LBB6_11
+ cmp byte ptr [rcx + r10], 0
+ js .LBB6_11
add rax, -8
pop rcx
.cfi_def_cfa_offset 8
ret... where |
|
I guess that another benefit to bumping MSRV would be the ability to add Even without an MSRV bump, I guess you could use the nightly feature as a temporary measure. |
|
I assume you mean adding that in pub fn find_bucket_index(&self, hash: u64, eq: impl FnMut(&T) -> bool) -> Option<usize> {
match self.raw.find(hash, eq) {
Some(bucket) => Some(unsafe {
let index = self.raw.bucket_index(&bucket);
core::hint::assert_unchecked(index < self.raw.buckets());
core::hint::assert_unchecked(self.raw.is_bucket_full(index));
index
}),
None => None,
}
}It does remove the unwrap in my simple back-to-back test, but it also pessimizes the unchecked case a little, recomputing the final address. Different variations of that assertion had the same effect. + neg r9
+ lea rax, [rax + 8*r9]
add rax, -8
pop rcx
.cfi_def_cfa_offset 8
retIn a real scenario I expect you'd also have other interim code, and who knows if the assertion could still be propagated. I don't think we should bother until there's a real measured benefit, and anyway even the unhinted checked case is still much less than a full hash probe that you would have needed before! |
|
RE MSRV, I recently bumped |
|
Is there any downside to having Separately, we probably want to add a method to find the next non-empty bucket after X. This would address the use cases in #420 and #382. |
i.e. instead of the separate I suppose the downside now is just that the caller would have to use an extra Hmm, if |
|
Note that there is a similar trio in |
|
I went ahead and made that change to try it -- easy enough to drop that commit if we change our minds. :) |
|
Hmm, on second thought it might be better to keep The only missing functionality here is the ability to insert a value just by bucket index, but it's not clear how useful this would be since the only way to get a valid index for this is by re-using the index of a previous insertion of the same key. |
|
Also possibly a bucket variant of |
069761b to
5796895
Compare
I don't think we should support this, as it would let you insert values in places that wouldn't be probed by the same hash, especially if we're not making any guarantees about index stability after removals. You can do something like: let (old, vacant) = table.get_bucket_entry(index).unwrap().remove();
let new_index = vacant.insert(new).bucket_index();... but I think even there we're not promising that the new index will be the same. Supposing we were to shift items to avoid tombstones, then the vacant entry might already represent a different bucket. |
| /// The order in which the iterator yields indices is unspecified | ||
| /// and may change in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would actually be fine to guarantee increasing order here, but the underlying raw::FullBucketsIndices is explicitly disclaiming that.
| /// The order in which the iterator yields indices is unspecified | ||
| /// and may change in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one should definitely be unspecified though, since it follows the probe order.
6ae99e1 to
d043b0f
Compare
|
Another possible method is |
`VacantEntry` now stores a `Tag` instead of a full `hash: u64`. This means that `OccupiedEntry` doesn't need to store anything, because it can get that tag from the control byte for `remove -> VacantEntry`. The `get_bucket_entry` method doesn't need a hash argument either. Also, since `OccupiedEntry` is now smaller, `enum Entry` will be the same size as `VacantEntry` by using a niche for the discriminant. (Although this is not _guaranteed_ by the compiler.)
|
Rebased to resolve conflicts with the |
|
FWIW, I was looking into what it would take to implement |
On
HashTable<T, A>:On
OccupiedEntry<'_, T, A>:IterHashBuckets<'_>implementsClone,Debug,Default,Iterator<Item = usize>, andFusedIterator.IterBuckets<'_>implements all those and alsoExactSizeIterator.Closes #613, although I chose different names.