Skip to content

Cloud writers for different formats directly on rust side. #19818

@chitralverma

Description

@chitralverma

Description

I was working on adding cloud writing functionality to the scala/ java bindings for polars when I came across the CloudWriter implementation for which std::io::Write is already implemented. This is great because now the dataframes can be persisted in cloud as different formats directly using the object_store crate in background.

I have implemented this in a PR, see tree and was wondering why we still use fsspec on py-polars side.

It'll be great if some one can check this out and let me know if there any issues with this approach. If it makes sense, I can raise a PR for this to persist dataframes to cloud for py-polars as well.

Approach:

  1. Create an instance dyn std::io::Write
    • Check if destination is a cloud url to return a CloudWriter provided as is in polars-rs, else
    • return a std::fs::File
  2. Pass this alongwith options to format writers like ParquetWriter ...
  3. call finish(...) on format writers

CC: @ritchie46

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or an improvement of an existing feature

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions