-
-
Notifications
You must be signed in to change notification settings - Fork 95
Open
Description
Detailed Description
Currently, Quartz Solar Forecast stores forecast outputs and historical weather data in Parquet and csv files. While Parquet is efficient for storage, users still need to load entire files into memory or write custom aggregation scripts to analyze the data.
Adding DuckDB as a query layer would allow:
- Direct SQL queries on Parquet files without loading all data into memory.
- Fast aggregation, filtering, and joining across multiple Parquet files (e.g., multi-site forecasts or weather data).
- Easier ad hoc analysis and evaluation of model performance.
- Integration with Python via Pandas DataFrames while still leveraging SQL efficiency.
Example use cases:
- Compute hourly average predicted power across all sites.
- Filter forecasts for specific time ranges or locations.
- Join forecast data with historical weather datasets for feature engineering.
Context
This change would benefit:
- Developers and analysts working with large forecast datasets.
- Users who want to perform quick queries without writing custom scripts for every analysis.
- Situations where multiple sites or long time periods generate large Parquet datasets.
- Fast querying and analytics on local or in-memory datasets (especially PV site data, weather predictions, or timeseries CSV).
- Effortless integration with Pandas and Polars DataFrames.
- SQL queries inside Python/Jupyter Notebooks, improving ETL and aggregation operations compared to pure Pandas for larger datasets.
- Easy CSV/Parquet/Arrow file ingestion without complex setup.
- No server or infrastructure overhead, making setup simple.
Possible Implementation
import duckdb
import pandas as pd
# Load CSV or DataFrame directly into DuckDB
df = pd.read_csv('your_data.csv')
con = duckdb.connect()
# Register as DuckDB table and query using SQL
con.register('my_df', df)
result = con.execute('SELECT avg(power_output) FROM my_df WHERE timestamp > ?',
['2024-01-01']).fetchdf()
Metadata
Metadata
Assignees
Labels
No labels