-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
fs: support io_uring with tokio::fs::read
#7696
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
f4c97c2 to
7e73f17
Compare
tokio/src/fs/read.rs
Outdated
| let mut offset = 0; | ||
|
|
||
| while size_read < size { | ||
| let left_to_read = (size - size_read) as u32; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| let left_to_read = (size - size_read) as u32; | |
| let left_to_read = u32::try_from(size - size_read).unwrap_or(u32::MAX); |
To properly support files bigger than 4GB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max read size at a time is u32::MAX, we read the rest in other next iterations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future, if we know we're reading more than u32::MAX then we can batch 2 read requests to avoid extra syscalls
6237a4c to
636cfb8
Compare
tokio/src/fs/read_uring.rs
Outdated
| // increase read size per Op submission | ||
| // 2. In case of small reads by the kernel, also gradually increase | ||
| // read size per Op submission to read files in lesser cycles | ||
| if size_read_usize <= buf.len() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
buf.len (r_buf) is updated at line 119 and it accumulates the old length + size_read_usize.
I think this if check will always pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I'll change the buf.len() to the read_len to detect shorter reads, how else do you think I should detect shorter reads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could add something like let requested = read_len; anywhere before read_len -= size_read; and use if size_read_usize < requested { here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks complicated, I will review it incrementally.
| // extra single capacity for the whole size to fit without any reallocation | ||
| let buf = Vec::with_capacity(size_hint.unwrap_or(0).saturating_add(1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra single capacity for the whole size to fit without any reallocation
Why doesn't the standard library do it?
// https://doc.rust-lang.org/1.90.0/src/std/fs.rs.html#308
pub fn read<P: AsRef<Path>>(path: P) -> io::Result<Vec<u8>> {
fn inner(path: &Path) -> io::Result<Vec<u8>> {
let mut file = File::open(path)?;
let size = file.metadata().map(|m| m.len() as usize).ok();
let mut bytes = Vec::try_with_capacity(size.unwrap_or(0))?;
io::default_read_to_end(&mut file, &mut bytes, size)?;
Ok(bytes)
}
inner(path.as_ref())
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because they react on the bytes read rather than how many bytes to read ahead of time. In case of a short read during EOF and try_reserve is called, it allocates more space than needed and also doubles the capacity but if its +1 then there's no need and capacity isn't necessarily grown
tokio/src/fs/read_uring.rs
Outdated
| // TODO: use io uring in the future to obtain metadata | ||
| let size_hint: Option<usize> = file | ||
| .metadata() | ||
| .await | ||
| .map(|m| usize::try_from(m.len()).unwrap_or(usize::MAX)) | ||
| .ok(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of size_hint is to improve performance by avoiding memory allocation. However, it involves thread pool mechanisms—which may not be more efficient than multiple memory allocations due to the overhead of creating/terminating threads and inter-thread synchronization operations.
We need performance data to demonstrate that this approach is faster than the method without using size_hint.
I asked this because I want to eliminate unnecessary logic as much as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my question is Why would you think that? Would you think a statx helper will eliminate this problem or it will still involve thread/pool mechanisms? I guess it will also be a future SQE.
I will include some benchmarks and upload them in a separate branch to test this @ADD-SP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just two simple benchmarks are enough. One is for small file, another one is for large file.
6e0abae to
8120700
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't checked all of the details in read_uring.rs, but left some comments I've noticed so far.
| // SAFETY: The `len` of the amount to be read and the buffer that is passed | ||
| // should have capacity > len. | ||
| // | ||
| // If `len` read is higher than vector capacity then setting its length by | ||
| // the caller in terms of size_read can be unsound. | ||
| pub(crate) fn read(fd: OwnedFd, mut buf: Vec<u8>, len: u32, offset: u64) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function should be marked as unsafe if the caller needs to follow this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would agree but there's nothing inherently unsafe in calling this function nor it is unsound.
The safety comment is there since this function writes in uninitialized capacity the caller will try to set_len and if they do that on the basis of size_read and do not have sufficient capacity/len then it will be UB
8120700 to
e6c6ce7
Compare
e6c6ce7 to
b9c3885
Compare
Motivation
We slowly use io uring everywhere here I made the simple change of supporting
fs::readhowever it might not be as simple. Let me know if the unsafe I used is correct or not.We currently use the blocking
std::fs::MetaDatato obtain file size for buffer capacity and extend the length of the vector according to the bytes read in the CQE. This implementation sounds good on paper to me.Later we should implement an internal statx helper, in this PR or a seperate PR to make our uring implementation less painful to use. As this pr failed #7616
I will prefer to include statx helper in the same PR to avoid merging an inefficient read implementation given io uring is about being more efficient in file IO
Solution
Continue adopting io uring
strace on a
tokio::fs::readafter this change