Skip to content
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
137 changes: 137 additions & 0 deletions docs/docs/concepts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
---
sidebar_position: 3
---

# Core Concepts

This page explains the key concepts and terminology used in Anomstack.

## Metric Batch

A metric batch is the fundamental unit of configuration in Anomstack. It consists of:

- A configuration file (`config.yaml`)
- A SQL query file (`query.sql`) or Python ingest function
- Optional preprocessing function
- Optional custom configuration

Example structure:
```
metrics/
my_metric_batch/
config.yaml
query.sql
preprocess.py (optional)
```

## Jobs

Anomstack runs several types of jobs for each metric batch:

### Ingest Job
- Pulls data from your data source
- Executes your SQL query or Python function
- Stores raw data for processing

### Train Job
- Processes historical data
- Trains anomaly detection models
- Saves trained models to storage

### Score Job
- Applies trained models to new data
- Calculates anomaly scores
- Identifies potential anomalies

### Alert Job
- Evaluates anomaly scores
- Sends notifications via configured channels
- Handles alert throttling and snoozing

### Change Detection Job
- Monitors for significant changes in metrics
- Detects level shifts and trends
- Triggers alerts for important changes

### Plot Job
- Generates visualizations of metrics
- Creates anomaly score plots
- Produces plots for alerts and dashboard

## Alerts

Alerts are notifications sent when anomalies are detected. They can be configured to:

- Send via email or Slack
- Include visualizations
- Use custom templates
- Support different severity levels
- Include LLM-powered analysis

## Dashboard

The dashboard provides:

- Real-time metric visualization
- Anomaly score monitoring
- Alert history and management
- Metric configuration interface
- Performance analytics

## Storage

Anomstack uses storage for:

- Trained models
- Configuration files
- Alert history
- Performance metrics
- Dashboard data

Supported storage backends:
- Local filesystem
- Google Cloud Storage (GCS)
- Amazon S3
- Azure Blob Storage (coming soon)

## Data Sources

Anomstack supports various data sources:

- Python (direct integration)
- BigQuery
- Snowflake
- ClickHouse
- DuckDB
- SQLite
- MotherDuck
- Turso
- Redshift (coming soon)

## Configuration

Configuration is handled through:

- YAML files for metric batches
- Environment variables
- Command-line arguments
- Dashboard settings

## Scheduling

Jobs can be scheduled using:

- Cron expressions
- Dagster schedules
- Manual triggers
- Event-based triggers

## LLM Agent

The LLM agent provides:

- AI-powered anomaly analysis
- Natural language explanations
- Automated reporting
- Intelligent alert prioritization
- Historical context analysis
33 changes: 33 additions & 0 deletions docs/docs/configuration/metrics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
---
sidebar_position: 1
---

# Metrics Configuration

Learn how to configure metrics in Anomstack.

## Configuration File

The `config.yaml` file defines:
- Metric properties
- Data source settings
- Schedule configuration
- Alert thresholds
- Custom parameters

## Properties

Key configuration properties:
- `name`: Metric identifier
- `description`: Metric description
- `source`: Data source configuration
- `schedule`: Execution schedule
- `alerts`: Alert settings

## Examples

Coming soon...

## Best Practices

Coming soon...
53 changes: 53 additions & 0 deletions docs/docs/data-sources/bigquery.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
---
sidebar_position: 2
---

# BigQuery

Anomstack supports Google BigQuery as a data source for your metrics.

## Configuration

Configure BigQuery in your metric batch's `config.yaml`:

```yaml
db: "bigquery"
table_key: "your-project.dataset.table"
metric_batch: "your_metric_batch_name"
ingest_cron_schedule: "*/10 * * * *" # When to run the ingestion
ingest_sql: >
select
current_timestamp() as metric_timestamp,
'metric_name' as metric_name,
your_value as metric_value
from your_table;
```

## Authentication

You can authenticate with BigQuery in several ways:
- Service account credentials file
- Application Default Credentials
- Environment variables

## Examples

Check out the [BigQuery example](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/bigquery) for a complete working example.

## Best Practices

- Use parameterized queries for better security
- Consider query costs and optimization
- Use appropriate table partitioning
- Set up proper IAM permissions

## Limitations

- Query execution time limits
- Cost considerations for large queries
- Rate limits and quotas

## Related Links

- [BigQuery Documentation](https://cloud.google.com/bigquery/docs)
- [Example Queries](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/bigquery)
53 changes: 53 additions & 0 deletions docs/docs/data-sources/clickhouse.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
---
sidebar_position: 4
---

# ClickHouse

Anomstack supports ClickHouse as a data source for your metrics.

## Configuration

Configure ClickHouse in your metric batch's `config.yaml`:

```yaml
db: "clickhouse"
table_key: "your_database.your_table"
metric_batch: "your_metric_batch_name"
ingest_cron_schedule: "*/10 * * * *" # When to run the ingestion
ingest_sql: >
select
now() as metric_timestamp,
'metric_name' as metric_name,
your_value as metric_value
from your_table;
```

## Authentication

You can authenticate with ClickHouse using:
- Username and password
- Environment variables
- SSL/TLS certificates

## Examples

Check out the [ClickHouse example](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/clickhouse) for a complete working example.

## Best Practices

- Use appropriate table engines
- Consider query optimization
- Implement proper access controls
- Use parameterized queries

## Limitations

- Memory usage considerations
- Query timeout limits
- Concurrent query limits

## Related Links

- [ClickHouse Documentation](https://clickhouse.com/docs)
- [Example Queries](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/clickhouse)
69 changes: 69 additions & 0 deletions docs/docs/data-sources/duckdb.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
---
sidebar_position: 5
---

# DuckDB

Anomstack supports DuckDB as a data source for your metrics. DuckDB is a fast analytical database that can read and write data from various file formats.

## Configuration

Configure DuckDB in your metric batch's `config.yaml`:

```yaml
db: "duckdb"
table_key: "metrics" # Default table to store metrics
metric_batch: "your_metric_batch_name"
ingest_cron_schedule: "*/3 * * * *" # When to run the ingestion
ingest_sql: >
select
current_timestamp() as metric_timestamp,
'metric_name' as metric_name,
your_value as metric_value
from your_table;
```

## Default Configuration

Many configuration parameters can be set in `metrics/defaults/defaults.yaml` to apply across all metric batches. Key defaults include:

```yaml
db: "duckdb" # Default database type
table_key: "metrics" # Default table name
ingest_cron_schedule: "*/3 * * * *" # Default ingestion schedule
model_path: "local://./models" # Default model storage location
alert_methods: "email,slack" # Default alert methods
```

You can override any of these defaults in your metric batch's configuration file.

## Features

DuckDB supports:
- Local file-based databases
- MotherDuck cloud integration
- Reading from various file formats (CSV, Parquet, JSON)
- SQL queries with Python integration

## Examples

Check out the [DuckDB example](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/duckdb) for a complete working example.

## Best Practices

- Use appropriate file formats for your data
- Consider query optimization
- Implement proper file permissions
- Use parameterized queries

## Limitations

- Local storage considerations
- Memory usage for large datasets
- Concurrent access limitations

## Related Links

- [DuckDB Documentation](https://duckdb.org/docs)
- [Example Queries](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/duckdb)
- [Default Configuration](https://github.com/andrewm4894/anomstack/tree/main/metrics/defaults/defaults.yaml)
Loading
Loading