diff --git a/docs/docs/concepts.md b/docs/docs/concepts.md new file mode 100644 index 00000000..96097f7c --- /dev/null +++ b/docs/docs/concepts.md @@ -0,0 +1,137 @@ +--- +sidebar_position: 3 +--- + +# Core Concepts + +This page explains the key concepts and terminology used in Anomstack. + +## Metric Batch + +A metric batch is the fundamental unit of configuration in Anomstack. It consists of: + +- A configuration file (`config.yaml`) +- A SQL query file (`query.sql`) or Python ingest function +- Optional preprocessing function +- Optional custom configuration + +Example structure: +``` +metrics/ + my_metric_batch/ + config.yaml + query.sql + preprocess.py (optional) +``` + +## Jobs + +Anomstack runs several types of jobs for each metric batch: + +### Ingest Job +- Pulls data from your data source +- Executes your SQL query or Python function +- Stores raw data for processing + +### Train Job +- Processes historical data +- Trains anomaly detection models +- Saves trained models to storage + +### Score Job +- Applies trained models to new data +- Calculates anomaly scores +- Identifies potential anomalies + +### Alert Job +- Evaluates anomaly scores +- Sends notifications via configured channels +- Handles alert throttling and snoozing + +### Change Detection Job +- Monitors for significant changes in metrics +- Detects level shifts and trends +- Triggers alerts for important changes + +### Plot Job +- Generates visualizations of metrics +- Creates anomaly score plots +- Produces plots for alerts and dashboard + +## Alerts + +Alerts are notifications sent when anomalies are detected. They can be configured to: + +- Send via email or Slack +- Include visualizations +- Use custom templates +- Support different severity levels +- Include LLM-powered analysis + +## Dashboard + +The dashboard provides: + +- Real-time metric visualization +- Anomaly score monitoring +- Alert history and management +- Metric configuration interface +- Performance analytics + +## Storage + +Anomstack uses storage for: + +- Trained models +- Configuration files +- Alert history +- Performance metrics +- Dashboard data + +Supported storage backends: +- Local filesystem +- Google Cloud Storage (GCS) +- Amazon S3 +- Azure Blob Storage (coming soon) + +## Data Sources + +Anomstack supports various data sources: + +- Python (direct integration) +- BigQuery +- Snowflake +- ClickHouse +- DuckDB +- SQLite +- MotherDuck +- Turso +- Redshift (coming soon) + +## Configuration + +Configuration is handled through: + +- YAML files for metric batches +- Environment variables +- Command-line arguments +- Dashboard settings + +## Scheduling + +Jobs can be scheduled using: + +- Cron expressions +- Dagster schedules +- Manual triggers +- Event-based triggers + +## LLM Agent + +The LLM agent provides: + +- AI-powered anomaly analysis +- Natural language explanations +- Automated reporting +- Intelligent alert prioritization +- Historical context analysis \ No newline at end of file diff --git a/docs/docs/configuration/metrics.md b/docs/docs/configuration/metrics.md new file mode 100644 index 00000000..6eb13347 --- /dev/null +++ b/docs/docs/configuration/metrics.md @@ -0,0 +1,33 @@ +--- +sidebar_position: 1 +--- + +# Metrics Configuration + +Learn how to configure metrics in Anomstack. + +## Configuration File + +The `config.yaml` file defines: +- Metric properties +- Data source settings +- Schedule configuration +- Alert thresholds +- Custom parameters + +## Properties + +Key configuration properties: +- `name`: Metric identifier +- `description`: Metric description +- `source`: Data source configuration +- `schedule`: Execution schedule +- `alerts`: Alert settings + +## Examples + +Coming soon... + +## Best Practices + +Coming soon... \ No newline at end of file diff --git a/docs/docs/data-sources/bigquery.md b/docs/docs/data-sources/bigquery.md new file mode 100644 index 00000000..cc7b8ab3 --- /dev/null +++ b/docs/docs/data-sources/bigquery.md @@ -0,0 +1,53 @@ +--- +sidebar_position: 2 +--- + +# BigQuery + +Anomstack supports Google BigQuery as a data source for your metrics. + +## Configuration + +Configure BigQuery in your metric batch's `config.yaml`: + +```yaml +db: "bigquery" +table_key: "your-project.dataset.table" +metric_batch: "your_metric_batch_name" +ingest_cron_schedule: "*/10 * * * *" # When to run the ingestion +ingest_sql: > + select + current_timestamp() as metric_timestamp, + 'metric_name' as metric_name, + your_value as metric_value + from your_table; +``` + +## Authentication + +You can authenticate with BigQuery in several ways: +- Service account credentials file +- Application Default Credentials +- Environment variables + +## Examples + +Check out the [BigQuery example](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/bigquery) for a complete working example. + +## Best Practices + +- Use parameterized queries for better security +- Consider query costs and optimization +- Use appropriate table partitioning +- Set up proper IAM permissions + +## Limitations + +- Query execution time limits +- Cost considerations for large queries +- Rate limits and quotas + +## Related Links + +- [BigQuery Documentation](https://cloud.google.com/bigquery/docs) +- [Example Queries](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/bigquery) \ No newline at end of file diff --git a/docs/docs/data-sources/clickhouse.md b/docs/docs/data-sources/clickhouse.md new file mode 100644 index 00000000..eb0af3df --- /dev/null +++ b/docs/docs/data-sources/clickhouse.md @@ -0,0 +1,53 @@ +--- +sidebar_position: 4 +--- + +# ClickHouse + +Anomstack supports ClickHouse as a data source for your metrics. + +## Configuration + +Configure ClickHouse in your metric batch's `config.yaml`: + +```yaml +db: "clickhouse" +table_key: "your_database.your_table" +metric_batch: "your_metric_batch_name" +ingest_cron_schedule: "*/10 * * * *" # When to run the ingestion +ingest_sql: > + select + now() as metric_timestamp, + 'metric_name' as metric_name, + your_value as metric_value + from your_table; +``` + +## Authentication + +You can authenticate with ClickHouse using: +- Username and password +- Environment variables +- SSL/TLS certificates + +## Examples + +Check out the [ClickHouse example](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/clickhouse) for a complete working example. + +## Best Practices + +- Use appropriate table engines +- Consider query optimization +- Implement proper access controls +- Use parameterized queries + +## Limitations + +- Memory usage considerations +- Query timeout limits +- Concurrent query limits + +## Related Links + +- [ClickHouse Documentation](https://clickhouse.com/docs) +- [Example Queries](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/clickhouse) \ No newline at end of file diff --git a/docs/docs/data-sources/duckdb.md b/docs/docs/data-sources/duckdb.md new file mode 100644 index 00000000..ef422e1e --- /dev/null +++ b/docs/docs/data-sources/duckdb.md @@ -0,0 +1,69 @@ +--- +sidebar_position: 5 +--- + +# DuckDB + +Anomstack supports DuckDB as a data source for your metrics. DuckDB is a fast analytical database that can read and write data from various file formats. + +## Configuration + +Configure DuckDB in your metric batch's `config.yaml`: + +```yaml +db: "duckdb" +table_key: "metrics" # Default table to store metrics +metric_batch: "your_metric_batch_name" +ingest_cron_schedule: "*/3 * * * *" # When to run the ingestion +ingest_sql: > + select + current_timestamp() as metric_timestamp, + 'metric_name' as metric_name, + your_value as metric_value + from your_table; +``` + +## Default Configuration + +Many configuration parameters can be set in `metrics/defaults/defaults.yaml` to apply across all metric batches. Key defaults include: + +```yaml +db: "duckdb" # Default database type +table_key: "metrics" # Default table name +ingest_cron_schedule: "*/3 * * * *" # Default ingestion schedule +model_path: "local://./models" # Default model storage location +alert_methods: "email,slack" # Default alert methods +``` + +You can override any of these defaults in your metric batch's configuration file. + +## Features + +DuckDB supports: +- Local file-based databases +- MotherDuck cloud integration +- Reading from various file formats (CSV, Parquet, JSON) +- SQL queries with Python integration + +## Examples + +Check out the [DuckDB example](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/duckdb) for a complete working example. + +## Best Practices + +- Use appropriate file formats for your data +- Consider query optimization +- Implement proper file permissions +- Use parameterized queries + +## Limitations + +- Local storage considerations +- Memory usage for large datasets +- Concurrent access limitations + +## Related Links + +- [DuckDB Documentation](https://duckdb.org/docs) +- [Example Queries](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/duckdb) +- [Default Configuration](https://github.com/andrewm4894/anomstack/tree/main/metrics/defaults/defaults.yaml) \ No newline at end of file diff --git a/docs/docs/data-sources/motherduck.md b/docs/docs/data-sources/motherduck.md new file mode 100644 index 00000000..eb3262ed --- /dev/null +++ b/docs/docs/data-sources/motherduck.md @@ -0,0 +1,70 @@ +--- +sidebar_position: 7 +--- + +# MotherDuck + +Anomstack supports MotherDuck as a data source for your metrics. MotherDuck is a cloud-based version of DuckDB that provides serverless analytics capabilities. + +## Configuration + +Configure MotherDuck in your metric batch's `config.yaml`: + +```yaml +db: "motherduck" +table_key: "your_database.metrics" # Your MotherDuck database and table +metric_batch: "your_metric_batch_name" +ingest_cron_schedule: "*/3 * * * *" # When to run the ingestion +ingest_sql: > + select + current_timestamp() as metric_timestamp, + 'metric_name' as metric_name, + your_value as metric_value + from your_table; +``` + +## Default Configuration + +Many configuration parameters can be set in `metrics/defaults/defaults.yaml` to apply across all metric batches. Key defaults include: + +```yaml +db: "duckdb" # Default database type +table_key: "metrics" # Default table name +ingest_cron_schedule: "*/3 * * * *" # Default ingestion schedule +model_path: "local://./models" # Default model storage location +alert_methods: "email,slack" # Default alert methods +``` + +You can override any of these defaults in your metric batch's configuration file. + +## Features + +MotherDuck provides: +- Serverless analytics +- Cloud storage integration +- Real-time collaboration +- DuckDB compatibility + +## Examples + +Check out the [MotherDuck example](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/motherduck) for a complete working example. + +## Best Practices + +- Token security +- Query optimization +- Cost management +- Data partitioning + +## Limitations + +- Query timeout limits +- Concurrent query limits +- Storage limitations +- Cost considerations + +## Related Links + +- [MotherDuck Documentation](https://motherduck.com/docs) +- [Example Queries](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/motherduck) +- [Default Configuration](https://github.com/andrewm4894/anomstack/tree/main/metrics/defaults/defaults.yaml) \ No newline at end of file diff --git a/docs/docs/data-sources/python.md b/docs/docs/data-sources/python.md new file mode 100644 index 00000000..7e0ea3d1 --- /dev/null +++ b/docs/docs/data-sources/python.md @@ -0,0 +1,153 @@ +--- +sidebar_position: 1 +--- + +# Python + +Anomstack supports Python as a data source for your metrics. This allows you to create custom data ingestion logic using Python's rich ecosystem of libraries. + +## Configuration + +Configure Python in your metric batch's `config.yaml`: + +```yaml +metric_batch: "your_metric_batch_name" +table_key: "your_table_key" +ingest_cron_schedule: "45 6 * * *" # When to run the ingestion +ingest_fn: > + {% include "./path/to/your/python/file.py" %} +``` + +## Default Configuration + +Many configuration parameters can be set in `metrics/defaults/defaults.yaml` to apply across all metric batches. Key defaults include: + +```yaml +db: "duckdb" # Default database type +table_key: "metrics" # Default table name +ingest_cron_schedule: "*/3 * * * *" # Default ingestion schedule +model_path: "local://./models" # Default model storage location +alert_methods: "email,slack" # Default alert methods +``` + +You can override any of these defaults in your metric batch's configuration file. + +## Customizing Default Templates + +Anomstack uses several default templates for preprocessing, SQL queries, and other operations. You can customize these by modifying the files in: + +1. **Python Templates** (`metrics/defaults/python/`): + - `preprocess.py`: Customize how metrics are preprocessed before anomaly detection + - Add your own Python functions for custom processing + +2. **SQL Templates** (`metrics/defaults/sql/`): + - `train.sql`: SQL for training data preparation + - `score.sql`: SQL for scoring data preparation + - `alerts.sql`: SQL for alert generation + - `change.sql`: SQL for change detection + - `plot.sql`: SQL for metric visualization + - `llmalert.sql`: SQL for LLM-based alerts + - `dashboard.sql`: SQL for dashboard data + - `delete.sql`: SQL for data cleanup + - `summary.sql`: SQL for summary reports + +To use custom templates, modify the corresponding files in these directories. The changes will apply to all metric batches unless overridden in specific batch configurations. + +## Example: HackerNews Top Stories + +Here's a complete example that fetches metrics from HackerNews top stories: + +```python +import pandas as pd +import requests + +def ingest(top_n=10) -> pd.DataFrame: + # Hacker News API endpoint for top stories + url = "https://hacker-news.firebaseio.com/v0/topstories.json" + + # Get top story IDs + response = requests.get(url) + story_ids = response.json()[:top_n] + + # Calculate metrics + min_score = float("inf") + max_score = 0 + total_score = 0 + + for story_id in story_ids: + story_url = f"https://hacker-news.firebaseio.com/v0/item/{story_id}.json" + story = requests.get(story_url).json() + score = story.get("score", 0) + + min_score = min(min_score, score) + max_score = max(max_score, score) + total_score += score + + avg_score = total_score / len(story_ids) + + # Create DataFrame with metrics + data = [ + [f"hn_top_{top_n}_min_score", min_score], + [f"hn_top_{top_n}_max_score", max_score], + [f"hn_top_{top_n}_avg_score", avg_score], + [f"hn_top_{top_n}_total_score", total_score], + ] + df = pd.DataFrame(data, columns=["metric_name", "metric_value"]) + df["metric_timestamp"] = pd.Timestamp.utcnow() + + return df +``` + +### How It Works + +1. **Configuration**: + ```yaml + metric_batch: "hn_top_stories_scores" + table_key: "metrics_hackernews" + ingest_cron_schedule: "45 6 * * *" + ingest_fn: > + {% include "./examples/hackernews/hn_top_stories_scores.py" %} + ``` + +2. **Function Definition**: The `ingest()` function takes a `top_n` parameter to specify how many top stories to analyze. + +3. **Data Collection**: + - Fetches top story IDs from HackerNews API + - Retrieves details for each story + - Calculates min, max, average, and total scores + +4. **DataFrame Creation**: + - Creates a DataFrame with required columns: `metric_name`, `metric_value`, and `metric_timestamp` + - Each metric is a separate row in the DataFrame + - Timestamps are in UTC + +## Best Practices + +- Return a pandas DataFrame with required columns +- Include proper error handling +- Use type hints for better code clarity +- Document your functions +- Handle API rate limits and timeouts +- Use environment variables for sensitive data + +## Required DataFrame Structure + +Your Python function must return a pandas DataFrame with these columns: +- `metric_name`: String identifier for the metric +- `metric_value`: Numeric value of the metric +- `metric_timestamp`: UTC timestamp of when the metric was collected + +## Limitations + +- Python environment must have required dependencies installed +- Function execution time limits +- Memory usage considerations +- API rate limits for external services + +## Related Links + +- [Example Implementation](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/hackernews) +- [Pandas Documentation](https://pandas.pydata.org/docs/) +- [Python Best Practices](https://docs.python-guide.org/) +- [Default Configuration](https://github.com/andrewm4894/anomstack/tree/main/metrics/defaults/defaults.yaml) +- [Default Templates](https://github.com/andrewm4894/anomstack/tree/main/metrics/defaults) \ No newline at end of file diff --git a/docs/docs/data-sources/redshift.md b/docs/docs/data-sources/redshift.md new file mode 100644 index 00000000..3fd403e9 --- /dev/null +++ b/docs/docs/data-sources/redshift.md @@ -0,0 +1,71 @@ +--- +sidebar_position: 9 +--- + +# Redshift + +Anomstack supports Amazon Redshift as a data source for your metrics. Redshift is a fully managed, petabyte-scale data warehouse service. + +## Configuration + +Configure Redshift in your metric batch's `config.yaml`: + +```yaml +db: "redshift" +table_key: "your_database.schema.metrics" # Your Redshift database, schema, and table +metric_batch: "your_metric_batch_name" +ingest_cron_schedule: "*/3 * * * *" # When to run the ingestion +ingest_sql: > + select + current_timestamp as metric_timestamp, + 'metric_name' as metric_name, + your_value as metric_value + from your_table; +``` + +## Default Configuration + +Many configuration parameters can be set in `metrics/defaults/defaults.yaml` to apply across all metric batches. Key defaults include: + +```yaml +db: "duckdb" # Default database type +table_key: "metrics" # Default table name +ingest_cron_schedule: "*/3 * * * *" # Default ingestion schedule +model_path: "local://./models" # Default model storage location +alert_methods: "email,slack" # Default alert methods +``` + +You can override any of these defaults in your metric batch's configuration file. + +## Features + +Redshift provides: +- Petabyte-scale data warehouse +- Columnar storage +- Parallel query execution +- Advanced analytics + +## Examples + +Check out the [Redshift example](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/redshift) for a complete working example. + +## Best Practices + +- Query optimization +- Distribution key selection +- Sort key optimization +- Workload management +- Cost optimization + +## Limitations + +- Query timeout limits +- Concurrent query limits +- Storage limitations +- Cost considerations + +## Related Links + +- [Redshift Documentation](https://docs.aws.amazon.com/redshift/) +- [Example Queries](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/redshift) +- [Default Configuration](https://github.com/andrewm4894/anomstack/tree/main/metrics/defaults/defaults.yaml) \ No newline at end of file diff --git a/docs/docs/data-sources/snowflake.md b/docs/docs/data-sources/snowflake.md new file mode 100644 index 00000000..7c821e85 --- /dev/null +++ b/docs/docs/data-sources/snowflake.md @@ -0,0 +1,54 @@ +--- +sidebar_position: 3 +--- + +# Snowflake + +Anomstack supports Snowflake as a data source for your metrics. + +## Configuration + +Configure Snowflake in your metric batch's `config.yaml`: + +```yaml +db: "snowflake" +table_key: "YOUR_DATABASE.SCHEMA.TABLE" +metric_batch: "your_metric_batch_name" +ingest_cron_schedule: "*/60 * * * *" # When to run the ingestion +ingest_sql: > + select + current_timestamp() as metric_timestamp, + 'metric_name' as metric_name, + your_value as metric_value + from your_table; +``` + +## Authentication + +You can authenticate with Snowflake using: +- Username and password +- Key pair authentication +- OAuth +- Environment variables + +## Examples + +Check out the [Snowflake example](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/snowflake) for a complete working example. + +## Best Practices + +- Use appropriate warehouse sizing +- Consider query optimization +- Implement proper access controls +- Use parameterized queries + +## Limitations + +- Warehouse credits consumption +- Query timeout limits +- Concurrent query limits + +## Related Links + +- [Snowflake Documentation](https://docs.snowflake.com/) +- [Example Queries](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/snowflake) \ No newline at end of file diff --git a/docs/docs/data-sources/sqlite.md b/docs/docs/data-sources/sqlite.md new file mode 100644 index 00000000..86948135 --- /dev/null +++ b/docs/docs/data-sources/sqlite.md @@ -0,0 +1,70 @@ +--- +sidebar_position: 6 +--- + +# SQLite + +Anomstack supports SQLite as a data source for your metrics. SQLite is a lightweight, file-based database that's perfect for local development and small to medium-sized applications. + +## Configuration + +Configure SQLite in your metric batch's `config.yaml`: + +```yaml +db: "sqlite" +table_key: "metrics" # Default table to store metrics +metric_batch: "your_metric_batch_name" +ingest_cron_schedule: "*/3 * * * *" # When to run the ingestion +ingest_sql: > + select + datetime('now') as metric_timestamp, + 'metric_name' as metric_name, + your_value as metric_value + from your_table; +``` + +## Default Configuration + +Many configuration parameters can be set in `metrics/defaults/defaults.yaml` to apply across all metric batches. Key defaults include: + +```yaml +db: "duckdb" # Default database type +table_key: "metrics" # Default table name +ingest_cron_schedule: "*/3 * * * *" # Default ingestion schedule +model_path: "local://./models" # Default model storage location +alert_methods: "email,slack" # Default alert methods +``` + +You can override any of these defaults in your metric batch's configuration file. + +## Features + +SQLite provides: +- File-based database storage +- Zero configuration +- ACID compliance +- Full SQL support + +## Examples + +Check out the [SQLite example](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/sqlite) for a complete working example. + +## Best Practices + +- Regular database backups +- Proper file permissions +- Index optimization +- Query optimization + +## Limitations + +- Concurrent write operations +- File size limitations +- Memory constraints +- Network access limitations + +## Related Links + +- [SQLite Documentation](https://www.sqlite.org/docs.html) +- [Example Queries](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/sqlite) +- [Default Configuration](https://github.com/andrewm4894/anomstack/tree/main/metrics/defaults/defaults.yaml) \ No newline at end of file diff --git a/docs/docs/data-sources/turso.md b/docs/docs/data-sources/turso.md new file mode 100644 index 00000000..d7d2bf33 --- /dev/null +++ b/docs/docs/data-sources/turso.md @@ -0,0 +1,70 @@ +--- +sidebar_position: 8 +--- + +# Turso + +Anomstack supports Turso as a data source for your metrics. Turso is a distributed SQLite database that provides global replication and edge computing capabilities. + +## Configuration + +Configure Turso in your metric batch's `config.yaml`: + +```yaml +db: "turso" +table_key: "your_database.metrics" # Your Turso database and table +metric_batch: "your_metric_batch_name" +ingest_cron_schedule: "*/3 * * * *" # When to run the ingestion +ingest_sql: > + select + datetime('now') as metric_timestamp, + 'metric_name' as metric_name, + your_value as metric_value + from your_table; +``` + +## Default Configuration + +Many configuration parameters can be set in `metrics/defaults/defaults.yaml` to apply across all metric batches. Key defaults include: + +```yaml +db: "duckdb" # Default database type +table_key: "metrics" # Default table name +ingest_cron_schedule: "*/3 * * * *" # Default ingestion schedule +model_path: "local://./models" # Default model storage location +alert_methods: "email,slack" # Default alert methods +``` + +You can override any of these defaults in your metric batch's configuration file. + +## Features + +Turso provides: +- Global replication +- Edge computing +- SQLite compatibility +- Real-time sync + +## Examples + +Check out the [Turso example](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/turso) for a complete working example. + +## Best Practices + +- Token security +- Query optimization +- Replication strategy +- Data partitioning + +## Limitations + +- Query timeout limits +- Concurrent query limits +- Storage limitations +- Cost considerations + +## Related Links + +- [Turso Documentation](https://turso.tech/docs) +- [Example Queries](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/turso) +- [Default Configuration](https://github.com/andrewm4894/anomstack/tree/main/metrics/defaults/defaults.yaml) \ No newline at end of file diff --git a/docs/docs/deployment/docker.md b/docs/docs/deployment/docker.md new file mode 100644 index 00000000..055932df --- /dev/null +++ b/docs/docs/deployment/docker.md @@ -0,0 +1,42 @@ +--- +sidebar_position: 1 +--- + +# Docker Deployment + +Deploy Anomstack using Docker for easy setup and management. + +## Prerequisites + +- Docker +- Docker Compose +- Git + +## Quick Start + +1. Clone the repository: +```bash +git clone https://github.com/andrewm4894/anomstack.git +cd anomstack +``` + +2. Start the containers: +```bash +docker compose up -d +``` + +## Configuration + +Configure your Docker deployment through: +- Environment variables +- Volume mounts +- Network settings +- Resource limits + +## Examples + +Coming soon... + +## Best Practices + +Coming soon... \ No newline at end of file diff --git a/docs/docs/features/alerts.md b/docs/docs/features/alerts.md new file mode 100644 index 00000000..9c2c1732 --- /dev/null +++ b/docs/docs/features/alerts.md @@ -0,0 +1,38 @@ +--- +sidebar_position: 3 +--- + +# Alerts + +Anomstack provides flexible alerting capabilities to notify you when anomalies are detected. This section explains how to configure and manage alerts. + +## Alert Types + +Anomstack supports multiple alert channels: +- Email alerts +- Slack notifications +- Custom webhooks + +## Configuration + +Configure alerts through: +- Alert thresholds +- Notification templates +- Channel settings +- Throttling rules + +## Alert Management + +Features for managing alerts: +- Alert history +- Snoozing capabilities +- Alert grouping +- Severity levels + +## Examples + +Coming soon... + +## Best Practices + +Coming soon... \ No newline at end of file diff --git a/docs/docs/features/anomaly-detection.md b/docs/docs/features/anomaly-detection.md new file mode 100644 index 00000000..af24b4b6 --- /dev/null +++ b/docs/docs/features/anomaly-detection.md @@ -0,0 +1,40 @@ +--- +sidebar_position: 2 +--- + +# Anomaly Detection + +Anomstack uses PyOD (Python Outlier Detection) to detect anomalies in your metrics. This section explains how the anomaly detection works and how to configure it. + +## How It Works + +Anomstack's anomaly detection process: +1. Ingests metric data +2. Preprocesses the data +3. Trains detection models +4. Scores new data points +5. Identifies anomalies + +## Configuration + +You can configure anomaly detection through: +- Model selection +- Training parameters +- Scoring thresholds +- Custom preprocessing + +## Models + +Anomstack supports various anomaly detection models: +- Isolation Forest +- Local Outlier Factor +- One-Class SVM +- And more... + +## Examples + +Coming soon... + +## Best Practices + +Coming soon... \ No newline at end of file diff --git a/docs/docs/features/dashboard.md b/docs/docs/features/dashboard.md new file mode 100644 index 00000000..72ca5bab --- /dev/null +++ b/docs/docs/features/dashboard.md @@ -0,0 +1,46 @@ +--- +sidebar_position: 4 +--- + +# Dashboard + +The Anomstack dashboard provides a modern interface for monitoring your metrics and anomalies. Built with FastHTML and MonsterUI, it offers a rich user experience. + +## Features + +The dashboard includes: +- Real-time metric visualization +- Anomaly score monitoring +- Alert management +- Configuration interface +- Performance analytics + +## Navigation + +Key sections of the dashboard: +- Home view +- Metric batch view +- Anomaly list view +- Configuration pages +- Settings + +## Customization + +Customize your dashboard through: +- Layout options +- Theme settings +- Widget configuration +- Custom views + +## Examples + +You can explore a live demo of the Anomstack dashboard at [https://anomstack-demo.replit.app/](https://anomstack-demo.replit.app/). This demo instance showcases various features including: +- Real-time metric monitoring +- Anomaly detection visualization +- Alert management +- Different metric batch views +- Configuration options + +## Best Practices + +Coming soon... \ No newline at end of file diff --git a/docs/docs/features/llm-agent.md b/docs/docs/features/llm-agent.md new file mode 100644 index 00000000..b9d6da31 --- /dev/null +++ b/docs/docs/features/llm-agent.md @@ -0,0 +1,40 @@ +--- +sidebar_position: 5 +--- + +# LLM Agent + +The LLM (Large Language Model) agent in Anomstack provides AI-powered analysis of anomalies and automated reporting capabilities. + +## Features + +The LLM agent offers: +- Natural language explanations of anomalies +- Automated report generation +- Intelligent alert prioritization +- Historical context analysis +- Custom analysis templates + +## Configuration + +Configure the LLM agent through: +- Model selection +- Prompt templates +- Analysis parameters +- Integration settings + +## Integration + +The agent integrates with: +- Alert system +- Dashboard +- Reporting tools +- Custom workflows + +## Examples + +Coming soon... + +## Best Practices + +Coming soon... \ No newline at end of file diff --git a/docs/docs/features/metrics.md b/docs/docs/features/metrics.md new file mode 100644 index 00000000..34e8c167 --- /dev/null +++ b/docs/docs/features/metrics.md @@ -0,0 +1,31 @@ +--- +sidebar_position: 1 +--- + +# Metrics + +Metrics are the fundamental building blocks of Anomstack. This section explains how to define, configure, and manage your metrics. + +## Defining Metrics + +Metrics in Anomstack are defined using a combination of: +- SQL queries or Python functions +- YAML configuration files +- Optional preprocessing functions + +## Configuration + +Each metric requires a configuration file (`config.yaml`) that specifies: +- Metric name and description +- Data source configuration +- Schedule settings +- Alert thresholds +- Custom parameters + +## Examples + +Coming soon... + +## Best Practices + +Coming soon... \ No newline at end of file diff --git a/docs/docs/intro.md b/docs/docs/intro.md index dfe8ee61..ac6f3c42 100644 --- a/docs/docs/intro.md +++ b/docs/docs/intro.md @@ -1,14 +1,72 @@ -# Anomstack Docs +--- +sidebar_position: 1 +--- -Check out the [repo readme](https://github.com/andrewm4894/anomstack/blob/main/README.md) for more high level information on the project. +# Introduction to Anomstack -This docs site is mainly for more detailed and specific stuff. +Anomstack is an open-source anomaly detection platform that makes it easy to monitor and detect anomalies in your metrics data. Built on top of [Dagster](https://dagster.io/) for orchestration and [FastHTML](https://fastht.ml/) + [MonsterUI](https://github.com/AnswerDotAI/MonsterUI) for the dashboard, Anomstack provides a complete solution for metric monitoring and anomaly detection. -## Table of Contents +## Key Features -- [BigQuery](./bigquery.md) -- [Snowflake](./snowflake.md) -- [GCS](./gcs.md) -- [S3](./s3.md) -- [Deployment](./deployment/README.md) - - [GCP](./deployment/gcp.md) +- 🔍 **Powerful Anomaly Detection**: Built on [PyOD](https://pyod.readthedocs.io/en/latest/) for robust anomaly detection +- 📊 **Beautiful Dashboard**: Modern UI for visualizing metrics and anomalies +- 🔌 **Multiple Data Sources**: Support for various databases and data platforms +- 🔔 **Flexible Alerting**: Email and Slack notifications with customizable templates +- 🤖 **LLM Agent Integration**: AI-powered anomaly analysis and reporting +- 🛠️ **Easy Deployment**: Multiple deployment options including Docker, Dagster Cloud, and more + +## How It Works + +1. **Define Your Metrics**: Configure your metrics using SQL queries or Python functions +2. **Automatic Processing**: Anomstack handles ingestion, training, scoring, and alerting +3. **Monitor & Alert**: Get notified when anomalies are detected +4. **Visualize**: Use the dashboard to explore metrics and anomalies + +## Supported Data Sources + +Anomstack supports a wide range of data sources: + +- Python (direct integration) +- BigQuery +- Snowflake +- ClickHouse +- DuckDB +- SQLite +- MotherDuck +- Turso +- Redshift (coming soon) + +## Storage Options + +Store your trained models and configurations in: + +- Local filesystem +- Google Cloud Storage (GCS) +- Amazon S3 +- Azure Blob Storage (coming soon) + +## Getting Started + +Choose your preferred way to get started: + +- [Quickstart Guide](quickstart) +- [Docker Deployment](deployment/docker) +- [GCP Deployment](deployment/gcp) + +## Architecture + +Anomstack is built with a modular architecture that separates concerns: + +- **Ingestion**: Pull data from various sources +- **Processing**: Train models and detect anomalies +- **Alerting**: Send notifications via multiple channels +- **Dashboard**: Visualize metrics and anomalies +- **Storage**: Store models and configurations + +## Contributing + +We welcome contributions! Check out our [Contributing Guide](https://github.com/andrewm4894/anomstack/blob/main/CONTRIBUTING.md) to get started. + +## License + +Anomstack is open source and available under the [MIT License](https://github.com/andrewm4894/anomstack/blob/main/LICENSE). diff --git a/docs/docs/quickstart.md b/docs/docs/quickstart.md new file mode 100644 index 00000000..a2dd9f4c --- /dev/null +++ b/docs/docs/quickstart.md @@ -0,0 +1,112 @@ +--- +sidebar_position: 2 +--- + +# Quickstart Guide + +This guide will help you get started with Anomstack quickly. We'll cover the basic setup and show you how to monitor your first metric. + +## Prerequisites + +- Python 3.8 or higher +- pip (Python package manager) +- Git + +## Installation + +1. Clone the repository: +```bash +git clone https://github.com/andrewm4894/anomstack.git +cd anomstack +``` + +2. Install dependencies: +```bash +pip install -r requirements.txt +``` + +## Basic Configuration + +1. Create a new metric batch in the `metrics` directory: +```bash +mkdir -p metrics/my_first_metric +``` + +2. Create a SQL file for your metric (`metrics/my_first_metric/query.sql`): +```sql +SELECT + timestamp, + value +FROM your_table +WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '7 days' +ORDER BY timestamp +``` + +3. Create a configuration file (`metrics/my_first_metric/config.yaml`): +```yaml +name: my_first_metric +description: "My first metric in Anomstack" +source: + type: sqlite # or your preferred data source + query: query.sql +schedule: "0 * * * *" # Run every hour +``` + +## Running Anomstack + +1. Start the Dagster UI: +```bash +dagster dev -f anomstack/main.py +``` + +2. Start the dashboard: +```bash +python dashboard/app.py +``` + +3. Access the dashboard at `http://localhost:5000` + +## Monitoring Your Metric + +1. The metric will be automatically ingested based on your schedule +2. Anomstack will train a model on your historical data +3. New data points will be scored for anomalies +4. You'll receive alerts if anomalies are detected + +## Next Steps + +- [Learn about concepts](concepts) +- [Configure alerts](features/alerts) +- [Customize the dashboard](features/dashboard) +- [Explore deployment options](deployment/docker) + +## Ready-Made Example Metrics + +Want to see Anomstack in action with real data? Try these ready-made example metric batches: + +- [Currency](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/currency): Track currency exchange rates from public APIs. +- [Yahoo Finance (yfinance)](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/yfinance): Monitor stock prices and financial data using the Yahoo Finance API. +- [Weather](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/weather): Analyze weather data from Open Meteo. +- [CoinDesk](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/coindesk): Get Bitcoin price data from the CoinDesk API. +- [Hacker News](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/hackernews): Track top stories and scores from Hacker News. +- [Netdata](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/netdata): Monitor system metrics using the Netdata API. + +See the [full list of examples](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples) for more, including BigQuery, Prometheus, Google Trends, and more. + +## Common Issues + +### Metric Not Showing Up +- Check the Dagster UI for any job failures +- Verify your SQL query returns the expected data +- Ensure your configuration file is valid YAML + +### No Alerts +- Check your alert configuration +- Verify your email/Slack settings +- Look for any alert throttling settings + +## Need Help? + +- Check the [GitHub Issues](https://github.com/andrewm4894/anomstack/issues) +- Join our [Discord Community](https://discord.gg/anomstack) +- Read the [detailed documentation](intro) \ No newline at end of file diff --git a/docs/sidebars.js b/docs/sidebars.js index 23bd4c02..ca910eb4 100644 --- a/docs/sidebars.js +++ b/docs/sidebars.js @@ -18,23 +18,72 @@ const sidebars = { // But you can create a sidebar manually docsSidebar: [ - 'intro', - 'bigquery', - 'snowflake', - 'gcs', - 's3', + { + type: 'category', + label: 'Getting Started', + items: [ + 'intro', + 'quickstart', + 'concepts', + ], + }, + { + type: 'category', + label: 'Core Features', + items: [ + 'features/metrics', + 'features/anomaly-detection', + 'features/alerts', + 'features/dashboard', + 'features/llm-agent', + ], + }, + { + type: 'category', + label: 'Data Sources', + items: [ + 'data-sources/python', + 'data-sources/bigquery', + 'data-sources/snowflake', + 'data-sources/clickhouse', + 'data-sources/duckdb', + 'data-sources/sqlite', + 'data-sources/motherduck', + 'data-sources/turso', + 'data-sources/redshift', + ], + }, + { + type: 'category', + label: 'Storage', + items: [ + 'gcs', + 's3', + ], + }, { type: 'category', label: 'Deployment', - items: ['deployment/gcp'], + items: [ + 'deployment/docker', + 'deployment/gcp', + ], + }, + { + type: 'category', + label: 'Configuration', + items: [ + 'configuration/metrics', + ], }, { - type: 'category', - label: 'GraphQL', - items: [ - 'graphql/examples/start_schedule' - ], - }, + type: 'category', + label: 'API Reference', + items: [ + 'graphql/README', + 'graphql/examples/start_schedule', + ], + }, ], };