Currently I am using a bag that reduces to a local data frame. See my SO question/answer https://stackoverflow.com/questions/64512040/how-to-aggregate-large-number-of-small-csv-files-50k-efficiently-code-size/64517641
With a partitioning strategy it should be possible to build a distributed data frame (needed if the data is not that heavily reduced)