Skip to content

Conversation

@McKnight22
Copy link

@McKnight22 McKnight22 commented Oct 30, 2025

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

#6286

What's changed and what's your intention?

Summary (mandatory):
This PR introduces support for GZIP, BZIP2, XZ, ZSTD compression in the COPY TO statement for CSV/JSON exports.

Details:
Added CompressionType option to specify file export compression formats: GZIP, BZIP2, XZ, ZSTD.
Deprecated LazyBufferedWriter and simplify the data flow to Encoder -> Compressor -> FileWriter.
Implemented compressed file export functionality only for CSV and JSON.

PR Checklist

Please convert it to a draft if some of the following conditions are not met.

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR requires documentation updates.
  • API changes are backward compatible.
  • Schema or data changes are backward compatible.

@McKnight22 McKnight22 requested a review from a team as a code owner October 30, 2025 03:49
@github-actions github-actions bot added size/M docs-not-required This change does not impact docs. labels Oct 30, 2025
@killme2008
Copy link
Contributor

Thanks.

Question: Why deprecated LazyBufferedWriter?

@killme2008 killme2008 requested a review from Copilot October 30, 2025 06:42
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds compression support for JSON and CSV export functionality in the COPY TO command. The implementation introduces a new CompressedWriter abstraction that wraps async writers with compression support for multiple formats (GZIP, BZIP2, XZ, ZSTD).

Key Changes:

  • Added compressed_writer.rs module with CompressedWriter and IntoCompressedWriter trait
  • Refactored stream_to_file function to support compression for both JSON and CSV formats
  • Removed the LazyBufferedWriter and associated error types as compression is now handled by CompressedWriter
  • Added comprehensive test cases for compressed exports in both CSV and JSON formats

Reviewed Changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/common/datasource/src/compressed_writer.rs New module implementing compressed writer wrapper for multiple compression formats
src/common/datasource/src/file_format.rs Refactored stream_to_file to support compression; replaced LazyBufferedWriter with direct buffer handling and compression
src/common/datasource/src/file_format/json.rs Updated stream_to_json to accept JsonFormat parameter and pass compression type to stream_to_file; added compression tests
src/common/datasource/src/file_format/csv.rs Updated stream_to_csv to pass compression type to stream_to_file; added compression tests
src/common/datasource/src/buffered_writer.rs Removed LazyBufferedWriter implementation as it's no longer needed with the new compression approach
src/common/datasource/src/error.rs Removed BufferedWriterClosed error variant that was specific to the old LazyBufferedWriter
src/common/datasource/src/lib.rs Added compressed_writer module export
src/common/datasource/src/test_util.rs Updated test utility to pass JsonFormat parameter to stream_to_json
src/operator/src/statement/copy_table_to.rs Updated to pass JsonFormat to stream_to_json function
tests/cases/standalone/common/copy/copy_to_json_compressed.sql New SQL test cases for compressed JSON exports
tests/cases/standalone/common/copy/copy_to_json_compressed.result Expected results for compressed JSON export tests
tests/cases/standalone/common/copy/copy_to_csv_compressed.sql New SQL test cases for compressed CSV exports
tests/cases/standalone/common/copy/copy_to_csv_compressed.result Expected results for compressed CSV export tests

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@WenyXu
Copy link
Member

WenyXu commented Oct 30, 2025

Thanks.

Question: Why deprecated LazyBufferedWriter?

As mentioned earlier in #6286, the main reason is that the OpenDAL writer already provides internal buffering during writes.

@WenyXu WenyXu self-requested a review October 30, 2025 07:58
@github-actions github-actions bot added size/XXL and removed size/M labels Nov 6, 2025
@McKnight22 McKnight22 requested a review from WenyXu November 10, 2025 03:01
@WenyXu WenyXu requested a review from fengjiachun November 12, 2025 07:10
@github-actions github-actions bot added size/XL and removed size/XXL labels Nov 13, 2025
@McKnight22 McKnight22 force-pushed the pr-add_compression_option branch from 617b3c7 to 18bdd41 Compare November 14, 2025 04:05
Copy link
Member

@WenyXu WenyXu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

@github-actions github-actions bot added docs-required This change requires docs update. and removed docs-not-required This change does not impact docs. labels Nov 14, 2025
- Add CompressedWriter for real-time compression during CSV/JSON export
- Support GZIP, BZIP2, XZ, ZSTD compression formats
- Remove LazyBufferedWriter dependency for simplified architecture
- Implement Encoder -> Compressor -> FileWriter data flow
- Add tests for compressed CSV/JSON export

Signed-off-by: McKnight22 <[email protected]>
- refactor and extend compressed_writer tests
- add coverage for Bzip2 and Xz compression

Signed-off-by: McKnight22 <[email protected]>
- Switch to threshold-based chunked flushing
- Avoid unnecessary writes on empty buffers
- Replace direct write_all() calls with the new helper for consistency

Signed-off-by: McKnight22 <[email protected]>
- Add support for reading compressed CSV and JSON in COPY FROM
- Support GZIP, BZIP2, XZ, ZSTD compression formats
- Add tests for compressed CSV/JSON import

Signed-off-by: McKnight22 <[email protected]>
- Move temp_dir out of the loop

Signed-off-by: McKnight22 <[email protected]>
@McKnight22 McKnight22 force-pushed the pr-add_compression_option branch from a55ff8c to bc93f58 Compare November 14, 2025 11:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-required This change requires docs update. size/XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants