Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Goal of this PR
Expose Reader Offset and OCF Block Status for Concurrent Decoder
Description
This PR enhances the
ReaderandOCF Decoderby exposing internal state information that is critical for advanced processing scenarios, such as progress tracking, concurrently splitting, and debugging.Key Changes
Reader.InputOffset(): Adds a method to theReaderto retrieve the current input offset. This allows consumers to know exactly where in the underlying stream the reader is currently positioned.OCF Decoder.BlockStatus(): Introduces aBlockStatus()method (and corresponding struct) to the OCF Decoder. This provides a snapshot of the current block being processed, including:Current: The index of the current record within the block.Count: The total number of records in the current block.Size: The size (in bytes) of the current block.Offset: The input offset provided by the underlying reader.Motivation
Currently, the
avropackage abstracts away the underlying stream position and block details. While this is fine for simple reading, it limits users who need to:Use Case Example
A data processing pipeline can now use
BlockStatus()to log precise progress or checkpoint processing at specific block offsets, improving reliability and observability.