-
Notifications
You must be signed in to change notification settings - Fork 425
feat: implement compressed CSV/JSON export functionality #7162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: implement compressed CSV/JSON export functionality #7162
Conversation
|
Thanks. Question: Why deprecated LazyBufferedWriter? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds compression support for JSON and CSV export functionality in the COPY TO command. The implementation introduces a new CompressedWriter abstraction that wraps async writers with compression support for multiple formats (GZIP, BZIP2, XZ, ZSTD).
Key Changes:
- Added
compressed_writer.rsmodule withCompressedWriterandIntoCompressedWritertrait - Refactored
stream_to_filefunction to support compression for both JSON and CSV formats - Removed the
LazyBufferedWriterand associated error types as compression is now handled byCompressedWriter - Added comprehensive test cases for compressed exports in both CSV and JSON formats
Reviewed Changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/common/datasource/src/compressed_writer.rs | New module implementing compressed writer wrapper for multiple compression formats |
| src/common/datasource/src/file_format.rs | Refactored stream_to_file to support compression; replaced LazyBufferedWriter with direct buffer handling and compression |
| src/common/datasource/src/file_format/json.rs | Updated stream_to_json to accept JsonFormat parameter and pass compression type to stream_to_file; added compression tests |
| src/common/datasource/src/file_format/csv.rs | Updated stream_to_csv to pass compression type to stream_to_file; added compression tests |
| src/common/datasource/src/buffered_writer.rs | Removed LazyBufferedWriter implementation as it's no longer needed with the new compression approach |
| src/common/datasource/src/error.rs | Removed BufferedWriterClosed error variant that was specific to the old LazyBufferedWriter |
| src/common/datasource/src/lib.rs | Added compressed_writer module export |
| src/common/datasource/src/test_util.rs | Updated test utility to pass JsonFormat parameter to stream_to_json |
| src/operator/src/statement/copy_table_to.rs | Updated to pass JsonFormat to stream_to_json function |
| tests/cases/standalone/common/copy/copy_to_json_compressed.sql | New SQL test cases for compressed JSON exports |
| tests/cases/standalone/common/copy/copy_to_json_compressed.result | Expected results for compressed JSON export tests |
| tests/cases/standalone/common/copy/copy_to_csv_compressed.sql | New SQL test cases for compressed CSV exports |
| tests/cases/standalone/common/copy/copy_to_csv_compressed.result | Expected results for compressed CSV export tests |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
As mentioned earlier in #6286, the main reason is that the OpenDAL writer already provides internal buffering during writes. |
tests/cases/standalone/common/copy/copy_from_csv_compressed.sql
Outdated
Show resolved
Hide resolved
tests/cases/standalone/common/copy/copy_from_json_compressed.sql
Outdated
Show resolved
Hide resolved
617b3c7 to
18bdd41
Compare
WenyXu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest LGTM
- Add CompressedWriter for real-time compression during CSV/JSON export - Support GZIP, BZIP2, XZ, ZSTD compression formats - Remove LazyBufferedWriter dependency for simplified architecture - Implement Encoder -> Compressor -> FileWriter data flow - Add tests for compressed CSV/JSON export Signed-off-by: McKnight22 <[email protected]>
- refactor and extend compressed_writer tests - add coverage for Bzip2 and Xz compression Signed-off-by: McKnight22 <[email protected]>
- Switch to threshold-based chunked flushing - Avoid unnecessary writes on empty buffers - Replace direct write_all() calls with the new helper for consistency Signed-off-by: McKnight22 <[email protected]>
- Add support for reading compressed CSV and JSON in COPY FROM - Support GZIP, BZIP2, XZ, ZSTD compression formats - Add tests for compressed CSV/JSON import Signed-off-by: McKnight22 <[email protected]>
- Fix review comments Signed-off-by: McKnight22 <[email protected]>
- Move temp_dir out of the loop Signed-off-by: McKnight22 <[email protected]>
a55ff8c to
bc93f58
Compare
I hereby agree to the terms of the GreptimeDB CLA.
Refer to a related PR or issue link (optional)
#6286
What's changed and what's your intention?
Summary (mandatory):
This PR introduces support for GZIP, BZIP2, XZ, ZSTD compression in the COPY TO statement for CSV/JSON exports.
Details:
Added CompressionType option to specify file export compression formats: GZIP, BZIP2, XZ, ZSTD.
Deprecated LazyBufferedWriter and simplify the data flow to Encoder -> Compressor -> FileWriter.
Implemented compressed file export functionality only for CSV and JSON.
PR Checklist
Please convert it to a draft if some of the following conditions are not met.