Skip to content

Commit 3bc6802

Browse files
authored
Merge pull request #145 from andrewm4894/improve-docs
Improve docs
2 parents ca508c2 + 2226a68 commit 3bc6802

File tree

20 files changed

+1311
-22
lines changed

20 files changed

+1311
-22
lines changed

docs/docs/concepts.md

Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
---
2+
sidebar_position: 3
3+
---
4+
5+
# Core Concepts
6+
7+
This page explains the key concepts and terminology used in Anomstack.
8+
9+
## Metric Batch
10+
11+
A metric batch is the fundamental unit of configuration in Anomstack. It consists of:
12+
13+
- A configuration file (`config.yaml`)
14+
- A SQL query file (`query.sql`) or Python ingest function
15+
- Optional preprocessing function
16+
- Optional custom configuration
17+
18+
Example structure:
19+
```
20+
metrics/
21+
my_metric_batch/
22+
config.yaml
23+
query.sql
24+
preprocess.py (optional)
25+
```
26+
27+
## Jobs
28+
29+
Anomstack runs several types of jobs for each metric batch:
30+
31+
### Ingest Job
32+
- Pulls data from your data source
33+
- Executes your SQL query or Python function
34+
- Stores raw data for processing
35+
36+
### Train Job
37+
- Processes historical data
38+
- Trains anomaly detection models
39+
- Saves trained models to storage
40+
41+
### Score Job
42+
- Applies trained models to new data
43+
- Calculates anomaly scores
44+
- Identifies potential anomalies
45+
46+
### Alert Job
47+
- Evaluates anomaly scores
48+
- Sends notifications via configured channels
49+
- Handles alert throttling and snoozing
50+
51+
### Change Detection Job
52+
- Monitors for significant changes in metrics
53+
- Detects level shifts and trends
54+
- Triggers alerts for important changes
55+
56+
### Plot Job
57+
- Generates visualizations of metrics
58+
- Creates anomaly score plots
59+
- Produces plots for alerts and dashboard
60+
61+
## Alerts
62+
63+
Alerts are notifications sent when anomalies are detected. They can be configured to:
64+
65+
- Send via email or Slack
66+
- Include visualizations
67+
- Use custom templates
68+
- Support different severity levels
69+
- Include LLM-powered analysis
70+
71+
## Dashboard
72+
73+
The dashboard provides:
74+
75+
- Real-time metric visualization
76+
- Anomaly score monitoring
77+
- Alert history and management
78+
- Metric configuration interface
79+
- Performance analytics
80+
81+
## Storage
82+
83+
Anomstack uses storage for:
84+
85+
- Trained models
86+
- Configuration files
87+
- Alert history
88+
- Performance metrics
89+
- Dashboard data
90+
91+
Supported storage backends:
92+
- Local filesystem
93+
- Google Cloud Storage (GCS)
94+
- Amazon S3
95+
- Azure Blob Storage (coming soon)
96+
97+
## Data Sources
98+
99+
Anomstack supports various data sources:
100+
101+
- Python (direct integration)
102+
- BigQuery
103+
- Snowflake
104+
- ClickHouse
105+
- DuckDB
106+
- SQLite
107+
- MotherDuck
108+
- Turso
109+
- Redshift (coming soon)
110+
111+
## Configuration
112+
113+
Configuration is handled through:
114+
115+
- YAML files for metric batches
116+
- Environment variables
117+
- Command-line arguments
118+
- Dashboard settings
119+
120+
## Scheduling
121+
122+
Jobs can be scheduled using:
123+
124+
- Cron expressions
125+
- Dagster schedules
126+
- Manual triggers
127+
- Event-based triggers
128+
129+
## LLM Agent
130+
131+
The LLM agent provides:
132+
133+
- AI-powered anomaly analysis
134+
- Natural language explanations
135+
- Automated reporting
136+
- Intelligent alert prioritization
137+
- Historical context analysis

docs/docs/configuration/metrics.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
---
2+
sidebar_position: 1
3+
---
4+
5+
# Metrics Configuration
6+
7+
Learn how to configure metrics in Anomstack.
8+
9+
## Configuration File
10+
11+
The `config.yaml` file defines:
12+
- Metric properties
13+
- Data source settings
14+
- Schedule configuration
15+
- Alert thresholds
16+
- Custom parameters
17+
18+
## Properties
19+
20+
Key configuration properties:
21+
- `name`: Metric identifier
22+
- `description`: Metric description
23+
- `source`: Data source configuration
24+
- `schedule`: Execution schedule
25+
- `alerts`: Alert settings
26+
27+
## Examples
28+
29+
Coming soon...
30+
31+
## Best Practices
32+
33+
Coming soon...

docs/docs/data-sources/bigquery.md

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
---
2+
sidebar_position: 2
3+
---
4+
5+
# BigQuery
6+
7+
Anomstack supports Google BigQuery as a data source for your metrics.
8+
9+
## Configuration
10+
11+
Configure BigQuery in your metric batch's `config.yaml`:
12+
13+
```yaml
14+
db: "bigquery"
15+
table_key: "your-project.dataset.table"
16+
metric_batch: "your_metric_batch_name"
17+
ingest_cron_schedule: "*/10 * * * *" # When to run the ingestion
18+
ingest_sql: >
19+
select
20+
current_timestamp() as metric_timestamp,
21+
'metric_name' as metric_name,
22+
your_value as metric_value
23+
from your_table;
24+
```
25+
26+
## Authentication
27+
28+
You can authenticate with BigQuery in several ways:
29+
- Service account credentials file
30+
- Application Default Credentials
31+
- Environment variables
32+
33+
## Examples
34+
35+
Check out the [BigQuery example](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/bigquery) for a complete working example.
36+
37+
## Best Practices
38+
39+
- Use parameterized queries for better security
40+
- Consider query costs and optimization
41+
- Use appropriate table partitioning
42+
- Set up proper IAM permissions
43+
44+
## Limitations
45+
46+
- Query execution time limits
47+
- Cost considerations for large queries
48+
- Rate limits and quotas
49+
50+
## Related Links
51+
52+
- [BigQuery Documentation](https://cloud.google.com/bigquery/docs)
53+
- [Example Queries](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/bigquery)
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
---
2+
sidebar_position: 4
3+
---
4+
5+
# ClickHouse
6+
7+
Anomstack supports ClickHouse as a data source for your metrics.
8+
9+
## Configuration
10+
11+
Configure ClickHouse in your metric batch's `config.yaml`:
12+
13+
```yaml
14+
db: "clickhouse"
15+
table_key: "your_database.your_table"
16+
metric_batch: "your_metric_batch_name"
17+
ingest_cron_schedule: "*/10 * * * *" # When to run the ingestion
18+
ingest_sql: >
19+
select
20+
now() as metric_timestamp,
21+
'metric_name' as metric_name,
22+
your_value as metric_value
23+
from your_table;
24+
```
25+
26+
## Authentication
27+
28+
You can authenticate with ClickHouse using:
29+
- Username and password
30+
- Environment variables
31+
- SSL/TLS certificates
32+
33+
## Examples
34+
35+
Check out the [ClickHouse example](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/clickhouse) for a complete working example.
36+
37+
## Best Practices
38+
39+
- Use appropriate table engines
40+
- Consider query optimization
41+
- Implement proper access controls
42+
- Use parameterized queries
43+
44+
## Limitations
45+
46+
- Memory usage considerations
47+
- Query timeout limits
48+
- Concurrent query limits
49+
50+
## Related Links
51+
52+
- [ClickHouse Documentation](https://clickhouse.com/docs)
53+
- [Example Queries](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/clickhouse)

docs/docs/data-sources/duckdb.md

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
---
2+
sidebar_position: 5
3+
---
4+
5+
# DuckDB
6+
7+
Anomstack supports DuckDB as a data source for your metrics. DuckDB is a fast analytical database that can read and write data from various file formats.
8+
9+
## Configuration
10+
11+
Configure DuckDB in your metric batch's `config.yaml`:
12+
13+
```yaml
14+
db: "duckdb"
15+
table_key: "metrics" # Default table to store metrics
16+
metric_batch: "your_metric_batch_name"
17+
ingest_cron_schedule: "*/3 * * * *" # When to run the ingestion
18+
ingest_sql: >
19+
select
20+
current_timestamp() as metric_timestamp,
21+
'metric_name' as metric_name,
22+
your_value as metric_value
23+
from your_table;
24+
```
25+
26+
## Default Configuration
27+
28+
Many configuration parameters can be set in `metrics/defaults/defaults.yaml` to apply across all metric batches. Key defaults include:
29+
30+
```yaml
31+
db: "duckdb" # Default database type
32+
table_key: "metrics" # Default table name
33+
ingest_cron_schedule: "*/3 * * * *" # Default ingestion schedule
34+
model_path: "local://./models" # Default model storage location
35+
alert_methods: "email,slack" # Default alert methods
36+
```
37+
38+
You can override any of these defaults in your metric batch's configuration file.
39+
40+
## Features
41+
42+
DuckDB supports:
43+
- Local file-based databases
44+
- MotherDuck cloud integration
45+
- Reading from various file formats (CSV, Parquet, JSON)
46+
- SQL queries with Python integration
47+
48+
## Examples
49+
50+
Check out the [DuckDB example](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/duckdb) for a complete working example.
51+
52+
## Best Practices
53+
54+
- Use appropriate file formats for your data
55+
- Consider query optimization
56+
- Implement proper file permissions
57+
- Use parameterized queries
58+
59+
## Limitations
60+
61+
- Local storage considerations
62+
- Memory usage for large datasets
63+
- Concurrent access limitations
64+
65+
## Related Links
66+
67+
- [DuckDB Documentation](https://duckdb.org/docs)
68+
- [Example Queries](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/duckdb)
69+
- [Default Configuration](https://github.com/andrewm4894/anomstack/tree/main/metrics/defaults/defaults.yaml)

0 commit comments

Comments
 (0)