andrewm4894 · andrewm4894 · Jun 2, 2025 · Jun 2, 2025 · Jun 2, 2025 · Jun 2, 2025
diff --git a/docs/docs/concepts.md b/docs/docs/concepts.md
@@ -0,0 +1,137 @@
+---
+sidebar_position: 3
+---
+
+# Core Concepts
+
+This page explains the key concepts and terminology used in Anomstack.
+
+## Metric Batch
+
+A metric batch is the fundamental unit of configuration in Anomstack. It consists of:
+
+- A configuration file (`config.yaml`)
+- A SQL query file (`query.sql`) or Python ingest function
+- Optional preprocessing function
+- Optional custom configuration
+
+Example structure:
+```
+metrics/
+  my_metric_batch/
+    config.yaml
+    query.sql
+    preprocess.py (optional)
+```
+
+## Jobs
+
+Anomstack runs several types of jobs for each metric batch:
+
+### Ingest Job
+- Pulls data from your data source
+- Executes your SQL query or Python function
+- Stores raw data for processing
+
+### Train Job
+- Processes historical data
+- Trains anomaly detection models
+- Saves trained models to storage
+
+### Score Job
+- Applies trained models to new data
+- Calculates anomaly scores
+- Identifies potential anomalies
+
+### Alert Job
+- Evaluates anomaly scores
+- Sends notifications via configured channels
+- Handles alert throttling and snoozing
+
+### Change Detection Job
+- Monitors for significant changes in metrics
+- Detects level shifts and trends
+- Triggers alerts for important changes
+
+### Plot Job
+- Generates visualizations of metrics
+- Creates anomaly score plots
+- Produces plots for alerts and dashboard
+
+## Alerts
+
+Alerts are notifications sent when anomalies are detected. They can be configured to:
+
+- Send via email or Slack
+- Include visualizations
+- Use custom templates
+- Support different severity levels
+- Include LLM-powered analysis
+
+## Dashboard
+
+The dashboard provides:
+
+- Real-time metric visualization
+- Anomaly score monitoring
+- Alert history and management
+- Metric configuration interface
+- Performance analytics
+
+## Storage
+
+Anomstack uses storage for:
+
+- Trained models
+- Configuration files
+- Alert history
+- Performance metrics
+- Dashboard data
+
+Supported storage backends:
+- Local filesystem
+- Google Cloud Storage (GCS)
+- Amazon S3
+- Azure Blob Storage (coming soon)
+
+## Data Sources
+
+Anomstack supports various data sources:
+
+- Python (direct integration)
+- BigQuery
+- Snowflake
+- ClickHouse
+- DuckDB
+- SQLite
+- MotherDuck
+- Turso
+- Redshift (coming soon)
+
+## Configuration
+
+Configuration is handled through:
+
+- YAML files for metric batches
+- Environment variables
+- Command-line arguments
+- Dashboard settings
+
+## Scheduling
+
+Jobs can be scheduled using:
+
+- Cron expressions
+- Dagster schedules
+- Manual triggers
+- Event-based triggers
+
+## LLM Agent
+
+The LLM agent provides:
+
+- AI-powered anomaly analysis
+- Natural language explanations
+- Automated reporting
+- Intelligent alert prioritization
+- Historical context analysis 
diff --git a/docs/docs/configuration/metrics.md b/docs/docs/configuration/metrics.md
@@ -0,0 +1,33 @@
+---
+sidebar_position: 1
+---
+
+# Metrics Configuration
+
+Learn how to configure metrics in Anomstack.
+
+## Configuration File
+
+The `config.yaml` file defines:
+- Metric properties
+- Data source settings
+- Schedule configuration
+- Alert thresholds
+- Custom parameters
+
+## Properties
+
+Key configuration properties:
+- `name`: Metric identifier
+- `description`: Metric description
+- `source`: Data source configuration
+- `schedule`: Execution schedule
+- `alerts`: Alert settings
+
+## Examples
+
+Coming soon...
+
+## Best Practices
+
+Coming soon... 
diff --git a/docs/docs/data-sources/bigquery.md b/docs/docs/data-sources/bigquery.md
@@ -0,0 +1,53 @@
+---
+sidebar_position: 2
+---
+
+# BigQuery
+
+Anomstack supports Google BigQuery as a data source for your metrics.
+
+## Configuration
+
+Configure BigQuery in your metric batch's `config.yaml`:
+
+```yaml
+db: "bigquery"
+table_key: "your-project.dataset.table"
+metric_batch: "your_metric_batch_name"
+ingest_cron_schedule: "*/10 * * * *"  # When to run the ingestion
+ingest_sql: >
+  select
+    current_timestamp() as metric_timestamp,
+    'metric_name' as metric_name,
+    your_value as metric_value
+  from your_table;
+```
+
+## Authentication
+
+You can authenticate with BigQuery in several ways:
+- Service account credentials file
+- Application Default Credentials
+- Environment variables
+
+## Examples
+
+Check out the [BigQuery example](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/bigquery) for a complete working example.
+
+## Best Practices
+
+- Use parameterized queries for better security
+- Consider query costs and optimization
+- Use appropriate table partitioning
+- Set up proper IAM permissions
+
+## Limitations
+
+- Query execution time limits
+- Cost considerations for large queries
+- Rate limits and quotas
+
+## Related Links
+
+- [BigQuery Documentation](https://cloud.google.com/bigquery/docs)
+- [Example Queries](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/bigquery) 
diff --git a/docs/docs/data-sources/clickhouse.md b/docs/docs/data-sources/clickhouse.md
@@ -0,0 +1,53 @@
+---
+sidebar_position: 4
+---
+
+# ClickHouse
+
+Anomstack supports ClickHouse as a data source for your metrics.
+
+## Configuration
+
+Configure ClickHouse in your metric batch's `config.yaml`:
+
+```yaml
+db: "clickhouse"
+table_key: "your_database.your_table"
+metric_batch: "your_metric_batch_name"
+ingest_cron_schedule: "*/10 * * * *"  # When to run the ingestion
+ingest_sql: >
+  select
+    now() as metric_timestamp,
+    'metric_name' as metric_name,
+    your_value as metric_value
+  from your_table;
+```
+
+## Authentication
+
+You can authenticate with ClickHouse using:
+- Username and password
+- Environment variables
+- SSL/TLS certificates
+
+## Examples
+
+Check out the [ClickHouse example](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/clickhouse) for a complete working example.
+
+## Best Practices
+
+- Use appropriate table engines
+- Consider query optimization
+- Implement proper access controls
+- Use parameterized queries
+
+## Limitations
+
+- Memory usage considerations
+- Query timeout limits
+- Concurrent query limits
+
+## Related Links
+
+- [ClickHouse Documentation](https://clickhouse.com/docs)
+- [Example Queries](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/clickhouse) 
diff --git a/docs/docs/data-sources/duckdb.md b/docs/docs/data-sources/duckdb.md
@@ -0,0 +1,69 @@
+---
+sidebar_position: 5
+---
+
+# DuckDB
+
+Anomstack supports DuckDB as a data source for your metrics. DuckDB is a fast analytical database that can read and write data from various file formats.
+
+## Configuration
+
+Configure DuckDB in your metric batch's `config.yaml`:
+
+```yaml
+db: "duckdb"
+table_key: "metrics"  # Default table to store metrics
+metric_batch: "your_metric_batch_name"
+ingest_cron_schedule: "*/3 * * * *"  # When to run the ingestion
+ingest_sql: >
+  select
+    current_timestamp() as metric_timestamp,
+    'metric_name' as metric_name,
+    your_value as metric_value
+  from your_table;
+```
+
+## Default Configuration
+
+Many configuration parameters can be set in `metrics/defaults/defaults.yaml` to apply across all metric batches. Key defaults include:
+
+```yaml
+db: "duckdb"  # Default database type
+table_key: "metrics"  # Default table name
+ingest_cron_schedule: "*/3 * * * *"  # Default ingestion schedule
+model_path: "local://./models"  # Default model storage location
+alert_methods: "email,slack"  # Default alert methods
+```
+
+You can override any of these defaults in your metric batch's configuration file.
+
+## Features
+
+DuckDB supports:
+- Local file-based databases
+- MotherDuck cloud integration
+- Reading from various file formats (CSV, Parquet, JSON)
+- SQL queries with Python integration
+
+## Examples
+
+Check out the [DuckDB example](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/duckdb) for a complete working example.
+
+## Best Practices
+
+- Use appropriate file formats for your data
+- Consider query optimization
+- Implement proper file permissions
+- Use parameterized queries
+
+## Limitations
+
+- Local storage considerations
+- Memory usage for large datasets
+- Concurrent access limitations
+
+## Related Links
+
+- [DuckDB Documentation](https://duckdb.org/docs)
+- [Example Queries](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/duckdb)
+- [Default Configuration](https://github.com/andrewm4894/anomstack/tree/main/metrics/defaults/defaults.yaml)