From 567e94ff2153ad8e8a022d6fcc75d1ed9adf5e3d Mon Sep 17 00:00:00 2001 From: andrewm4894 Date: Mon, 2 Jun 2025 22:47:02 +0100 Subject: [PATCH 1/4] Revamp documentation structure and enhance introduction content - Updated `sidebars.js` to organize documentation into categories: Getting Started, Core Features, Data Sources, Storage, Deployment, Configuration, and API Reference. - Expanded `intro.md` to provide a comprehensive overview of Anomstack, including key features, how it works, supported data sources, storage options, and getting started guides. --- docs/docs/concepts.md | 137 ++++++++++++++++++++++++++++++++++++++++ docs/docs/intro.md | 79 ++++++++++++++++++++--- docs/docs/quickstart.md | 99 +++++++++++++++++++++++++++++ docs/sidebars.js | 78 +++++++++++++++++++---- 4 files changed, 371 insertions(+), 22 deletions(-) create mode 100644 docs/docs/concepts.md create mode 100644 docs/docs/quickstart.md diff --git a/docs/docs/concepts.md b/docs/docs/concepts.md new file mode 100644 index 00000000..96097f7c --- /dev/null +++ b/docs/docs/concepts.md @@ -0,0 +1,137 @@ +--- +sidebar_position: 3 +--- + +# Core Concepts + +This page explains the key concepts and terminology used in Anomstack. + +## Metric Batch + +A metric batch is the fundamental unit of configuration in Anomstack. It consists of: + +- A configuration file (`config.yaml`) +- A SQL query file (`query.sql`) or Python ingest function +- Optional preprocessing function +- Optional custom configuration + +Example structure: +``` +metrics/ + my_metric_batch/ + config.yaml + query.sql + preprocess.py (optional) +``` + +## Jobs + +Anomstack runs several types of jobs for each metric batch: + +### Ingest Job +- Pulls data from your data source +- Executes your SQL query or Python function +- Stores raw data for processing + +### Train Job +- Processes historical data +- Trains anomaly detection models +- Saves trained models to storage + +### Score Job +- Applies trained models to new data +- Calculates anomaly scores +- Identifies potential anomalies + +### Alert Job +- Evaluates anomaly scores +- Sends notifications via configured channels +- Handles alert throttling and snoozing + +### Change Detection Job +- Monitors for significant changes in metrics +- Detects level shifts and trends +- Triggers alerts for important changes + +### Plot Job +- Generates visualizations of metrics +- Creates anomaly score plots +- Produces plots for alerts and dashboard + +## Alerts + +Alerts are notifications sent when anomalies are detected. They can be configured to: + +- Send via email or Slack +- Include visualizations +- Use custom templates +- Support different severity levels +- Include LLM-powered analysis + +## Dashboard + +The dashboard provides: + +- Real-time metric visualization +- Anomaly score monitoring +- Alert history and management +- Metric configuration interface +- Performance analytics + +## Storage + +Anomstack uses storage for: + +- Trained models +- Configuration files +- Alert history +- Performance metrics +- Dashboard data + +Supported storage backends: +- Local filesystem +- Google Cloud Storage (GCS) +- Amazon S3 +- Azure Blob Storage (coming soon) + +## Data Sources + +Anomstack supports various data sources: + +- Python (direct integration) +- BigQuery +- Snowflake +- ClickHouse +- DuckDB +- SQLite +- MotherDuck +- Turso +- Redshift (coming soon) + +## Configuration + +Configuration is handled through: + +- YAML files for metric batches +- Environment variables +- Command-line arguments +- Dashboard settings + +## Scheduling + +Jobs can be scheduled using: + +- Cron expressions +- Dagster schedules +- Manual triggers +- Event-based triggers + +## LLM Agent + +The LLM agent provides: + +- AI-powered anomaly analysis +- Natural language explanations +- Automated reporting +- Intelligent alert prioritization +- Historical context analysis \ No newline at end of file diff --git a/docs/docs/intro.md b/docs/docs/intro.md index dfe8ee61..3ecbab49 100644 --- a/docs/docs/intro.md +++ b/docs/docs/intro.md @@ -1,14 +1,73 @@ -# Anomstack Docs +--- +sidebar_position: 1 +--- -Check out the [repo readme](https://github.com/andrewm4894/anomstack/blob/main/README.md) for more high level information on the project. +# Introduction to Anomstack -This docs site is mainly for more detailed and specific stuff. +Anomstack is an open-source anomaly detection platform that makes it easy to monitor and detect anomalies in your metrics data. Built on top of [Dagster](https://dagster.io/) for orchestration and [FastHTML](https://fastht.ml/) + [MonsterUI](https://github.com/AnswerDotAI/MonsterUI) for the dashboard, Anomstack provides a complete solution for metric monitoring and anomaly detection. -## Table of Contents +## Key Features -- [BigQuery](./bigquery.md) -- [Snowflake](./snowflake.md) -- [GCS](./gcs.md) -- [S3](./s3.md) -- [Deployment](./deployment/README.md) - - [GCP](./deployment/gcp.md) +- 🔍 **Powerful Anomaly Detection**: Built on [PyOD](https://pyod.readthedocs.io/en/latest/) for robust anomaly detection +- 📊 **Beautiful Dashboard**: Modern UI for visualizing metrics and anomalies +- 🔌 **Multiple Data Sources**: Support for various databases and data platforms +- 🔔 **Flexible Alerting**: Email and Slack notifications with customizable templates +- 🤖 **LLM Agent Integration**: AI-powered anomaly analysis and reporting +- 🛠️ **Easy Deployment**: Multiple deployment options including Docker, Dagster Cloud, and more + +## How It Works + +1. **Define Your Metrics**: Configure your metrics using SQL queries or Python functions +2. **Automatic Processing**: Anomstack handles ingestion, training, scoring, and alerting +3. **Monitor & Alert**: Get notified when anomalies are detected +4. **Visualize**: Use the dashboard to explore metrics and anomalies + +## Supported Data Sources + +Anomstack supports a wide range of data sources: + +- Python (direct integration) +- BigQuery +- Snowflake +- ClickHouse +- DuckDB +- SQLite +- MotherDuck +- Turso +- Redshift (coming soon) + +## Storage Options + +Store your trained models and configurations in: + +- Local filesystem +- Google Cloud Storage (GCS) +- Amazon S3 +- Azure Blob Storage (coming soon) + +## Getting Started + +Choose your preferred way to get started: + +- [Quickstart Guide](quickstart) +- [Docker Deployment](deployment/docker) +- [Dagster Cloud Setup](deployment/dagster-cloud) +- [Local Development](deployment/local) + +## Architecture + +Anomstack is built with a modular architecture that separates concerns: + +- **Ingestion**: Pull data from various sources +- **Processing**: Train models and detect anomalies +- **Alerting**: Send notifications via multiple channels +- **Dashboard**: Visualize metrics and anomalies +- **Storage**: Store models and configurations + +## Contributing + +We welcome contributions! Check out our [Contributing Guide](https://github.com/andrewm4894/anomstack/blob/main/CONTRIBUTING.md) to get started. + +## License + +Anomstack is open source and available under the [MIT License](https://github.com/andrewm4894/anomstack/blob/main/LICENSE). diff --git a/docs/docs/quickstart.md b/docs/docs/quickstart.md new file mode 100644 index 00000000..a0d496ba --- /dev/null +++ b/docs/docs/quickstart.md @@ -0,0 +1,99 @@ +--- +sidebar_position: 2 +--- + +# Quickstart Guide + +This guide will help you get started with Anomstack quickly. We'll cover the basic setup and show you how to monitor your first metric. + +## Prerequisites + +- Python 3.8 or higher +- pip (Python package manager) +- Git + +## Installation + +1. Clone the repository: +```bash +git clone https://github.com/andrewm4894/anomstack.git +cd anomstack +``` + +2. Install dependencies: +```bash +pip install -r requirements.txt +``` + +## Basic Configuration + +1. Create a new metric batch in the `metrics` directory: +```bash +mkdir -p metrics/my_first_metric +``` + +2. Create a SQL file for your metric (`metrics/my_first_metric/query.sql`): +```sql +SELECT + timestamp, + value +FROM your_table +WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '7 days' +ORDER BY timestamp +``` + +3. Create a configuration file (`metrics/my_first_metric/config.yaml`): +```yaml +name: my_first_metric +description: "My first metric in Anomstack" +source: + type: sqlite # or your preferred data source + query: query.sql +schedule: "0 * * * *" # Run every hour +``` + +## Running Anomstack + +1. Start the Dagster UI: +```bash +dagster dev -f anomstack/main.py +``` + +2. Start the dashboard: +```bash +python dashboard/app.py +``` + +3. Access the dashboard at `http://localhost:5000` + +## Monitoring Your Metric + +1. The metric will be automatically ingested based on your schedule +2. Anomstack will train a model on your historical data +3. New data points will be scored for anomalies +4. You'll receive alerts if anomalies are detected + +## Next Steps + +- [Learn about concepts](concepts) +- [Configure alerts](features/alerts) +- [Customize the dashboard](features/dashboard) +- [Explore deployment options](deployment/docker) + +## Common Issues + +### Metric Not Showing Up +- Check the Dagster UI for any job failures +- Verify your SQL query returns the expected data +- Ensure your configuration file is valid YAML + +### No Alerts +- Check your alert configuration +- Verify your email/Slack settings +- Look for any alert throttling settings + +## Need Help? + +- Check the [GitHub Issues](https://github.com/andrewm4894/anomstack/issues) +- Join our [Discord Community](https://discord.gg/anomstack) +- Read the [detailed documentation](intro) \ No newline at end of file diff --git a/docs/sidebars.js b/docs/sidebars.js index 23bd4c02..2a7059d8 100644 --- a/docs/sidebars.js +++ b/docs/sidebars.js @@ -18,23 +18,77 @@ const sidebars = { // But you can create a sidebar manually docsSidebar: [ - 'intro', - 'bigquery', - 'snowflake', - 'gcs', - 's3', + { + type: 'category', + label: 'Getting Started', + items: [ + 'intro', + 'quickstart', + 'concepts', + ], + }, + { + type: 'category', + label: 'Core Features', + items: [ + 'features/metrics', + 'features/anomaly-detection', + 'features/alerts', + 'features/dashboard', + 'features/llm-agent', + ], + }, + { + type: 'category', + label: 'Data Sources', + items: [ + 'data-sources/python', + 'data-sources/bigquery', + 'data-sources/snowflake', + 'data-sources/clickhouse', + 'data-sources/duckdb', + 'data-sources/sqlite', + 'data-sources/motherduck', + 'data-sources/turso', + ], + }, + { + type: 'category', + label: 'Storage', + items: [ + 'storage/local', + 'storage/gcs', + 'storage/s3', + ], + }, { type: 'category', label: 'Deployment', - items: ['deployment/gcp'], + items: [ + 'deployment/docker', + 'deployment/dagster-cloud', + 'deployment/github-codespaces', + 'deployment/replit', + 'deployment/local', + ], + }, + { + type: 'category', + label: 'Configuration', + items: [ + 'configuration/metrics', + 'configuration/alerts', + 'configuration/dashboard', + ], }, { - type: 'category', - label: 'GraphQL', - items: [ - 'graphql/examples/start_schedule' - ], - }, + type: 'category', + label: 'API Reference', + items: [ + 'api/graphql', + 'api/python', + ], + }, ], }; From c795e7d92a86e3776d4c2dc592127439ee3cf628 Mon Sep 17 00:00:00 2001 From: andrewm4894 Date: Mon, 2 Jun 2025 22:55:31 +0100 Subject: [PATCH 2/4] Refactor sidebar items in documentation for clarity - Simplified the structure of `sidebars.js` by removing redundant paths and consolidating items under relevant categories, including Data Sources, Storage, Deployment, and API Reference. --- docs/docs/configuration/metrics.md | 33 +++++++++++++++++++ docs/docs/data-sources/python.md | 31 ++++++++++++++++++ docs/docs/deployment/docker.md | 42 +++++++++++++++++++++++++ docs/docs/features/alerts.md | 38 ++++++++++++++++++++++ docs/docs/features/anomaly-detection.md | 40 +++++++++++++++++++++++ docs/docs/features/dashboard.md | 41 ++++++++++++++++++++++++ docs/docs/features/llm-agent.md | 40 +++++++++++++++++++++++ docs/docs/features/metrics.md | 31 ++++++++++++++++++ docs/sidebars.js | 25 +++++---------- 9 files changed, 303 insertions(+), 18 deletions(-) create mode 100644 docs/docs/configuration/metrics.md create mode 100644 docs/docs/data-sources/python.md create mode 100644 docs/docs/deployment/docker.md create mode 100644 docs/docs/features/alerts.md create mode 100644 docs/docs/features/anomaly-detection.md create mode 100644 docs/docs/features/dashboard.md create mode 100644 docs/docs/features/llm-agent.md create mode 100644 docs/docs/features/metrics.md diff --git a/docs/docs/configuration/metrics.md b/docs/docs/configuration/metrics.md new file mode 100644 index 00000000..6eb13347 --- /dev/null +++ b/docs/docs/configuration/metrics.md @@ -0,0 +1,33 @@ +--- +sidebar_position: 1 +--- + +# Metrics Configuration + +Learn how to configure metrics in Anomstack. + +## Configuration File + +The `config.yaml` file defines: +- Metric properties +- Data source settings +- Schedule configuration +- Alert thresholds +- Custom parameters + +## Properties + +Key configuration properties: +- `name`: Metric identifier +- `description`: Metric description +- `source`: Data source configuration +- `schedule`: Execution schedule +- `alerts`: Alert settings + +## Examples + +Coming soon... + +## Best Practices + +Coming soon... \ No newline at end of file diff --git a/docs/docs/data-sources/python.md b/docs/docs/data-sources/python.md new file mode 100644 index 00000000..25d97f0c --- /dev/null +++ b/docs/docs/data-sources/python.md @@ -0,0 +1,31 @@ +--- +sidebar_position: 1 +--- + +# Python Data Source + +Anomstack allows you to define custom Python functions for ingesting metrics data. + +## Overview + +The Python data source enables you to: +- Write custom data ingestion logic +- Process data before anomaly detection +- Integrate with any Python-compatible data source +- Transform data as needed + +## Configuration + +Configure Python data sources through: +- Function definition +- Parameter configuration +- Error handling +- Data validation + +## Examples + +Coming soon... + +## Best Practices + +Coming soon... \ No newline at end of file diff --git a/docs/docs/deployment/docker.md b/docs/docs/deployment/docker.md new file mode 100644 index 00000000..055932df --- /dev/null +++ b/docs/docs/deployment/docker.md @@ -0,0 +1,42 @@ +--- +sidebar_position: 1 +--- + +# Docker Deployment + +Deploy Anomstack using Docker for easy setup and management. + +## Prerequisites + +- Docker +- Docker Compose +- Git + +## Quick Start + +1. Clone the repository: +```bash +git clone https://github.com/andrewm4894/anomstack.git +cd anomstack +``` + +2. Start the containers: +```bash +docker compose up -d +``` + +## Configuration + +Configure your Docker deployment through: +- Environment variables +- Volume mounts +- Network settings +- Resource limits + +## Examples + +Coming soon... + +## Best Practices + +Coming soon... \ No newline at end of file diff --git a/docs/docs/features/alerts.md b/docs/docs/features/alerts.md new file mode 100644 index 00000000..9c2c1732 --- /dev/null +++ b/docs/docs/features/alerts.md @@ -0,0 +1,38 @@ +--- +sidebar_position: 3 +--- + +# Alerts + +Anomstack provides flexible alerting capabilities to notify you when anomalies are detected. This section explains how to configure and manage alerts. + +## Alert Types + +Anomstack supports multiple alert channels: +- Email alerts +- Slack notifications +- Custom webhooks + +## Configuration + +Configure alerts through: +- Alert thresholds +- Notification templates +- Channel settings +- Throttling rules + +## Alert Management + +Features for managing alerts: +- Alert history +- Snoozing capabilities +- Alert grouping +- Severity levels + +## Examples + +Coming soon... + +## Best Practices + +Coming soon... \ No newline at end of file diff --git a/docs/docs/features/anomaly-detection.md b/docs/docs/features/anomaly-detection.md new file mode 100644 index 00000000..af24b4b6 --- /dev/null +++ b/docs/docs/features/anomaly-detection.md @@ -0,0 +1,40 @@ +--- +sidebar_position: 2 +--- + +# Anomaly Detection + +Anomstack uses PyOD (Python Outlier Detection) to detect anomalies in your metrics. This section explains how the anomaly detection works and how to configure it. + +## How It Works + +Anomstack's anomaly detection process: +1. Ingests metric data +2. Preprocesses the data +3. Trains detection models +4. Scores new data points +5. Identifies anomalies + +## Configuration + +You can configure anomaly detection through: +- Model selection +- Training parameters +- Scoring thresholds +- Custom preprocessing + +## Models + +Anomstack supports various anomaly detection models: +- Isolation Forest +- Local Outlier Factor +- One-Class SVM +- And more... + +## Examples + +Coming soon... + +## Best Practices + +Coming soon... \ No newline at end of file diff --git a/docs/docs/features/dashboard.md b/docs/docs/features/dashboard.md new file mode 100644 index 00000000..19eafaa5 --- /dev/null +++ b/docs/docs/features/dashboard.md @@ -0,0 +1,41 @@ +--- +sidebar_position: 4 +--- + +# Dashboard + +The Anomstack dashboard provides a modern interface for monitoring your metrics and anomalies. Built with FastHTML and MonsterUI, it offers a rich user experience. + +## Features + +The dashboard includes: +- Real-time metric visualization +- Anomaly score monitoring +- Alert management +- Configuration interface +- Performance analytics + +## Navigation + +Key sections of the dashboard: +- Home view +- Metric batch view +- Anomaly list view +- Configuration pages +- Settings + +## Customization + +Customize your dashboard through: +- Layout options +- Theme settings +- Widget configuration +- Custom views + +## Examples + +Coming soon... + +## Best Practices + +Coming soon... \ No newline at end of file diff --git a/docs/docs/features/llm-agent.md b/docs/docs/features/llm-agent.md new file mode 100644 index 00000000..b9d6da31 --- /dev/null +++ b/docs/docs/features/llm-agent.md @@ -0,0 +1,40 @@ +--- +sidebar_position: 5 +--- + +# LLM Agent + +The LLM (Large Language Model) agent in Anomstack provides AI-powered analysis of anomalies and automated reporting capabilities. + +## Features + +The LLM agent offers: +- Natural language explanations of anomalies +- Automated report generation +- Intelligent alert prioritization +- Historical context analysis +- Custom analysis templates + +## Configuration + +Configure the LLM agent through: +- Model selection +- Prompt templates +- Analysis parameters +- Integration settings + +## Integration + +The agent integrates with: +- Alert system +- Dashboard +- Reporting tools +- Custom workflows + +## Examples + +Coming soon... + +## Best Practices + +Coming soon... \ No newline at end of file diff --git a/docs/docs/features/metrics.md b/docs/docs/features/metrics.md new file mode 100644 index 00000000..34e8c167 --- /dev/null +++ b/docs/docs/features/metrics.md @@ -0,0 +1,31 @@ +--- +sidebar_position: 1 +--- + +# Metrics + +Metrics are the fundamental building blocks of Anomstack. This section explains how to define, configure, and manage your metrics. + +## Defining Metrics + +Metrics in Anomstack are defined using a combination of: +- SQL queries or Python functions +- YAML configuration files +- Optional preprocessing functions + +## Configuration + +Each metric requires a configuration file (`config.yaml`) that specifies: +- Metric name and description +- Data source configuration +- Schedule settings +- Alert thresholds +- Custom parameters + +## Examples + +Coming soon... + +## Best Practices + +Coming soon... \ No newline at end of file diff --git a/docs/sidebars.js b/docs/sidebars.js index 2a7059d8..37ef4bf6 100644 --- a/docs/sidebars.js +++ b/docs/sidebars.js @@ -43,22 +43,16 @@ const sidebars = { label: 'Data Sources', items: [ 'data-sources/python', - 'data-sources/bigquery', - 'data-sources/snowflake', - 'data-sources/clickhouse', - 'data-sources/duckdb', - 'data-sources/sqlite', - 'data-sources/motherduck', - 'data-sources/turso', + 'bigquery', + 'snowflake', ], }, { type: 'category', label: 'Storage', items: [ - 'storage/local', - 'storage/gcs', - 'storage/s3', + 'gcs', + 's3', ], }, { @@ -66,10 +60,7 @@ const sidebars = { label: 'Deployment', items: [ 'deployment/docker', - 'deployment/dagster-cloud', - 'deployment/github-codespaces', - 'deployment/replit', - 'deployment/local', + 'deployment/gcp', ], }, { @@ -77,16 +68,14 @@ const sidebars = { label: 'Configuration', items: [ 'configuration/metrics', - 'configuration/alerts', - 'configuration/dashboard', ], }, { type: 'category', label: 'API Reference', items: [ - 'api/graphql', - 'api/python', + 'graphql/README', + 'graphql/examples/start_schedule', ], }, ], From d7a0a12e8d3dea2b940ebd1e75dea8344210d620 Mon Sep 17 00:00:00 2001 From: andrewm4894 Date: Mon, 2 Jun 2025 23:18:35 +0100 Subject: [PATCH 3/4] Enhance documentation by adding new data sources and example metrics - Updated `sidebars.js` to include additional data sources: ClickHouse, DuckDB, SQLite, MotherDuck, Turso, and Redshift. - Expanded `quickstart.md` with a section on ready-made example metrics, providing links to various metric batches for real data demonstration. - Revised `python.md` to improve configuration details and added a comprehensive example for ingesting metrics from HackerNews. - Updated `dashboard.md` to include a link to a live demo of the Anomstack dashboard showcasing its features. --- docs/docs/data-sources/bigquery.md | 53 +++++++++ docs/docs/data-sources/clickhouse.md | 53 +++++++++ docs/docs/data-sources/duckdb.md | 69 ++++++++++++ docs/docs/data-sources/motherduck.md | 70 ++++++++++++ docs/docs/data-sources/python.md | 156 ++++++++++++++++++++++++--- docs/docs/data-sources/redshift.md | 71 ++++++++++++ docs/docs/data-sources/snowflake.md | 54 ++++++++++ docs/docs/data-sources/sqlite.md | 70 ++++++++++++ docs/docs/data-sources/turso.md | 70 ++++++++++++ docs/docs/features/dashboard.md | 7 +- docs/docs/quickstart.md | 13 +++ docs/sidebars.js | 10 +- 12 files changed, 676 insertions(+), 20 deletions(-) create mode 100644 docs/docs/data-sources/bigquery.md create mode 100644 docs/docs/data-sources/clickhouse.md create mode 100644 docs/docs/data-sources/duckdb.md create mode 100644 docs/docs/data-sources/motherduck.md create mode 100644 docs/docs/data-sources/redshift.md create mode 100644 docs/docs/data-sources/snowflake.md create mode 100644 docs/docs/data-sources/sqlite.md create mode 100644 docs/docs/data-sources/turso.md diff --git a/docs/docs/data-sources/bigquery.md b/docs/docs/data-sources/bigquery.md new file mode 100644 index 00000000..cc7b8ab3 --- /dev/null +++ b/docs/docs/data-sources/bigquery.md @@ -0,0 +1,53 @@ +--- +sidebar_position: 2 +--- + +# BigQuery + +Anomstack supports Google BigQuery as a data source for your metrics. + +## Configuration + +Configure BigQuery in your metric batch's `config.yaml`: + +```yaml +db: "bigquery" +table_key: "your-project.dataset.table" +metric_batch: "your_metric_batch_name" +ingest_cron_schedule: "*/10 * * * *" # When to run the ingestion +ingest_sql: > + select + current_timestamp() as metric_timestamp, + 'metric_name' as metric_name, + your_value as metric_value + from your_table; +``` + +## Authentication + +You can authenticate with BigQuery in several ways: +- Service account credentials file +- Application Default Credentials +- Environment variables + +## Examples + +Check out the [BigQuery example](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/bigquery) for a complete working example. + +## Best Practices + +- Use parameterized queries for better security +- Consider query costs and optimization +- Use appropriate table partitioning +- Set up proper IAM permissions + +## Limitations + +- Query execution time limits +- Cost considerations for large queries +- Rate limits and quotas + +## Related Links + +- [BigQuery Documentation](https://cloud.google.com/bigquery/docs) +- [Example Queries](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/bigquery) \ No newline at end of file diff --git a/docs/docs/data-sources/clickhouse.md b/docs/docs/data-sources/clickhouse.md new file mode 100644 index 00000000..eb0af3df --- /dev/null +++ b/docs/docs/data-sources/clickhouse.md @@ -0,0 +1,53 @@ +--- +sidebar_position: 4 +--- + +# ClickHouse + +Anomstack supports ClickHouse as a data source for your metrics. + +## Configuration + +Configure ClickHouse in your metric batch's `config.yaml`: + +```yaml +db: "clickhouse" +table_key: "your_database.your_table" +metric_batch: "your_metric_batch_name" +ingest_cron_schedule: "*/10 * * * *" # When to run the ingestion +ingest_sql: > + select + now() as metric_timestamp, + 'metric_name' as metric_name, + your_value as metric_value + from your_table; +``` + +## Authentication + +You can authenticate with ClickHouse using: +- Username and password +- Environment variables +- SSL/TLS certificates + +## Examples + +Check out the [ClickHouse example](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/clickhouse) for a complete working example. + +## Best Practices + +- Use appropriate table engines +- Consider query optimization +- Implement proper access controls +- Use parameterized queries + +## Limitations + +- Memory usage considerations +- Query timeout limits +- Concurrent query limits + +## Related Links + +- [ClickHouse Documentation](https://clickhouse.com/docs) +- [Example Queries](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/clickhouse) \ No newline at end of file diff --git a/docs/docs/data-sources/duckdb.md b/docs/docs/data-sources/duckdb.md new file mode 100644 index 00000000..ef422e1e --- /dev/null +++ b/docs/docs/data-sources/duckdb.md @@ -0,0 +1,69 @@ +--- +sidebar_position: 5 +--- + +# DuckDB + +Anomstack supports DuckDB as a data source for your metrics. DuckDB is a fast analytical database that can read and write data from various file formats. + +## Configuration + +Configure DuckDB in your metric batch's `config.yaml`: + +```yaml +db: "duckdb" +table_key: "metrics" # Default table to store metrics +metric_batch: "your_metric_batch_name" +ingest_cron_schedule: "*/3 * * * *" # When to run the ingestion +ingest_sql: > + select + current_timestamp() as metric_timestamp, + 'metric_name' as metric_name, + your_value as metric_value + from your_table; +``` + +## Default Configuration + +Many configuration parameters can be set in `metrics/defaults/defaults.yaml` to apply across all metric batches. Key defaults include: + +```yaml +db: "duckdb" # Default database type +table_key: "metrics" # Default table name +ingest_cron_schedule: "*/3 * * * *" # Default ingestion schedule +model_path: "local://./models" # Default model storage location +alert_methods: "email,slack" # Default alert methods +``` + +You can override any of these defaults in your metric batch's configuration file. + +## Features + +DuckDB supports: +- Local file-based databases +- MotherDuck cloud integration +- Reading from various file formats (CSV, Parquet, JSON) +- SQL queries with Python integration + +## Examples + +Check out the [DuckDB example](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/duckdb) for a complete working example. + +## Best Practices + +- Use appropriate file formats for your data +- Consider query optimization +- Implement proper file permissions +- Use parameterized queries + +## Limitations + +- Local storage considerations +- Memory usage for large datasets +- Concurrent access limitations + +## Related Links + +- [DuckDB Documentation](https://duckdb.org/docs) +- [Example Queries](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/duckdb) +- [Default Configuration](https://github.com/andrewm4894/anomstack/tree/main/metrics/defaults/defaults.yaml) \ No newline at end of file diff --git a/docs/docs/data-sources/motherduck.md b/docs/docs/data-sources/motherduck.md new file mode 100644 index 00000000..eb3262ed --- /dev/null +++ b/docs/docs/data-sources/motherduck.md @@ -0,0 +1,70 @@ +--- +sidebar_position: 7 +--- + +# MotherDuck + +Anomstack supports MotherDuck as a data source for your metrics. MotherDuck is a cloud-based version of DuckDB that provides serverless analytics capabilities. + +## Configuration + +Configure MotherDuck in your metric batch's `config.yaml`: + +```yaml +db: "motherduck" +table_key: "your_database.metrics" # Your MotherDuck database and table +metric_batch: "your_metric_batch_name" +ingest_cron_schedule: "*/3 * * * *" # When to run the ingestion +ingest_sql: > + select + current_timestamp() as metric_timestamp, + 'metric_name' as metric_name, + your_value as metric_value + from your_table; +``` + +## Default Configuration + +Many configuration parameters can be set in `metrics/defaults/defaults.yaml` to apply across all metric batches. Key defaults include: + +```yaml +db: "duckdb" # Default database type +table_key: "metrics" # Default table name +ingest_cron_schedule: "*/3 * * * *" # Default ingestion schedule +model_path: "local://./models" # Default model storage location +alert_methods: "email,slack" # Default alert methods +``` + +You can override any of these defaults in your metric batch's configuration file. + +## Features + +MotherDuck provides: +- Serverless analytics +- Cloud storage integration +- Real-time collaboration +- DuckDB compatibility + +## Examples + +Check out the [MotherDuck example](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/motherduck) for a complete working example. + +## Best Practices + +- Token security +- Query optimization +- Cost management +- Data partitioning + +## Limitations + +- Query timeout limits +- Concurrent query limits +- Storage limitations +- Cost considerations + +## Related Links + +- [MotherDuck Documentation](https://motherduck.com/docs) +- [Example Queries](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/motherduck) +- [Default Configuration](https://github.com/andrewm4894/anomstack/tree/main/metrics/defaults/defaults.yaml) \ No newline at end of file diff --git a/docs/docs/data-sources/python.md b/docs/docs/data-sources/python.md index 25d97f0c..7e0ea3d1 100644 --- a/docs/docs/data-sources/python.md +++ b/docs/docs/data-sources/python.md @@ -2,30 +2,152 @@ sidebar_position: 1 --- -# Python Data Source +# Python -Anomstack allows you to define custom Python functions for ingesting metrics data. +Anomstack supports Python as a data source for your metrics. This allows you to create custom data ingestion logic using Python's rich ecosystem of libraries. -## Overview +## Configuration -The Python data source enables you to: -- Write custom data ingestion logic -- Process data before anomaly detection -- Integrate with any Python-compatible data source -- Transform data as needed +Configure Python in your metric batch's `config.yaml`: -## Configuration +```yaml +metric_batch: "your_metric_batch_name" +table_key: "your_table_key" +ingest_cron_schedule: "45 6 * * *" # When to run the ingestion +ingest_fn: > + {% include "./path/to/your/python/file.py" %} +``` + +## Default Configuration + +Many configuration parameters can be set in `metrics/defaults/defaults.yaml` to apply across all metric batches. Key defaults include: + +```yaml +db: "duckdb" # Default database type +table_key: "metrics" # Default table name +ingest_cron_schedule: "*/3 * * * *" # Default ingestion schedule +model_path: "local://./models" # Default model storage location +alert_methods: "email,slack" # Default alert methods +``` + +You can override any of these defaults in your metric batch's configuration file. + +## Customizing Default Templates + +Anomstack uses several default templates for preprocessing, SQL queries, and other operations. You can customize these by modifying the files in: + +1. **Python Templates** (`metrics/defaults/python/`): + - `preprocess.py`: Customize how metrics are preprocessed before anomaly detection + - Add your own Python functions for custom processing + +2. **SQL Templates** (`metrics/defaults/sql/`): + - `train.sql`: SQL for training data preparation + - `score.sql`: SQL for scoring data preparation + - `alerts.sql`: SQL for alert generation + - `change.sql`: SQL for change detection + - `plot.sql`: SQL for metric visualization + - `llmalert.sql`: SQL for LLM-based alerts + - `dashboard.sql`: SQL for dashboard data + - `delete.sql`: SQL for data cleanup + - `summary.sql`: SQL for summary reports + +To use custom templates, modify the corresponding files in these directories. The changes will apply to all metric batches unless overridden in specific batch configurations. + +## Example: HackerNews Top Stories -Configure Python data sources through: -- Function definition -- Parameter configuration -- Error handling -- Data validation +Here's a complete example that fetches metrics from HackerNews top stories: -## Examples +```python +import pandas as pd +import requests -Coming soon... +def ingest(top_n=10) -> pd.DataFrame: + # Hacker News API endpoint for top stories + url = "https://hacker-news.firebaseio.com/v0/topstories.json" + + # Get top story IDs + response = requests.get(url) + story_ids = response.json()[:top_n] + + # Calculate metrics + min_score = float("inf") + max_score = 0 + total_score = 0 + + for story_id in story_ids: + story_url = f"https://hacker-news.firebaseio.com/v0/item/{story_id}.json" + story = requests.get(story_url).json() + score = story.get("score", 0) + + min_score = min(min_score, score) + max_score = max(max_score, score) + total_score += score + + avg_score = total_score / len(story_ids) + + # Create DataFrame with metrics + data = [ + [f"hn_top_{top_n}_min_score", min_score], + [f"hn_top_{top_n}_max_score", max_score], + [f"hn_top_{top_n}_avg_score", avg_score], + [f"hn_top_{top_n}_total_score", total_score], + ] + df = pd.DataFrame(data, columns=["metric_name", "metric_value"]) + df["metric_timestamp"] = pd.Timestamp.utcnow() + + return df +``` + +### How It Works + +1. **Configuration**: + ```yaml + metric_batch: "hn_top_stories_scores" + table_key: "metrics_hackernews" + ingest_cron_schedule: "45 6 * * *" + ingest_fn: > + {% include "./examples/hackernews/hn_top_stories_scores.py" %} + ``` + +2. **Function Definition**: The `ingest()` function takes a `top_n` parameter to specify how many top stories to analyze. + +3. **Data Collection**: + - Fetches top story IDs from HackerNews API + - Retrieves details for each story + - Calculates min, max, average, and total scores + +4. **DataFrame Creation**: + - Creates a DataFrame with required columns: `metric_name`, `metric_value`, and `metric_timestamp` + - Each metric is a separate row in the DataFrame + - Timestamps are in UTC ## Best Practices -Coming soon... \ No newline at end of file +- Return a pandas DataFrame with required columns +- Include proper error handling +- Use type hints for better code clarity +- Document your functions +- Handle API rate limits and timeouts +- Use environment variables for sensitive data + +## Required DataFrame Structure + +Your Python function must return a pandas DataFrame with these columns: +- `metric_name`: String identifier for the metric +- `metric_value`: Numeric value of the metric +- `metric_timestamp`: UTC timestamp of when the metric was collected + +## Limitations + +- Python environment must have required dependencies installed +- Function execution time limits +- Memory usage considerations +- API rate limits for external services + +## Related Links + +- [Example Implementation](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/hackernews) +- [Pandas Documentation](https://pandas.pydata.org/docs/) +- [Python Best Practices](https://docs.python-guide.org/) +- [Default Configuration](https://github.com/andrewm4894/anomstack/tree/main/metrics/defaults/defaults.yaml) +- [Default Templates](https://github.com/andrewm4894/anomstack/tree/main/metrics/defaults) \ No newline at end of file diff --git a/docs/docs/data-sources/redshift.md b/docs/docs/data-sources/redshift.md new file mode 100644 index 00000000..3fd403e9 --- /dev/null +++ b/docs/docs/data-sources/redshift.md @@ -0,0 +1,71 @@ +--- +sidebar_position: 9 +--- + +# Redshift + +Anomstack supports Amazon Redshift as a data source for your metrics. Redshift is a fully managed, petabyte-scale data warehouse service. + +## Configuration + +Configure Redshift in your metric batch's `config.yaml`: + +```yaml +db: "redshift" +table_key: "your_database.schema.metrics" # Your Redshift database, schema, and table +metric_batch: "your_metric_batch_name" +ingest_cron_schedule: "*/3 * * * *" # When to run the ingestion +ingest_sql: > + select + current_timestamp as metric_timestamp, + 'metric_name' as metric_name, + your_value as metric_value + from your_table; +``` + +## Default Configuration + +Many configuration parameters can be set in `metrics/defaults/defaults.yaml` to apply across all metric batches. Key defaults include: + +```yaml +db: "duckdb" # Default database type +table_key: "metrics" # Default table name +ingest_cron_schedule: "*/3 * * * *" # Default ingestion schedule +model_path: "local://./models" # Default model storage location +alert_methods: "email,slack" # Default alert methods +``` + +You can override any of these defaults in your metric batch's configuration file. + +## Features + +Redshift provides: +- Petabyte-scale data warehouse +- Columnar storage +- Parallel query execution +- Advanced analytics + +## Examples + +Check out the [Redshift example](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/redshift) for a complete working example. + +## Best Practices + +- Query optimization +- Distribution key selection +- Sort key optimization +- Workload management +- Cost optimization + +## Limitations + +- Query timeout limits +- Concurrent query limits +- Storage limitations +- Cost considerations + +## Related Links + +- [Redshift Documentation](https://docs.aws.amazon.com/redshift/) +- [Example Queries](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/redshift) +- [Default Configuration](https://github.com/andrewm4894/anomstack/tree/main/metrics/defaults/defaults.yaml) \ No newline at end of file diff --git a/docs/docs/data-sources/snowflake.md b/docs/docs/data-sources/snowflake.md new file mode 100644 index 00000000..7c821e85 --- /dev/null +++ b/docs/docs/data-sources/snowflake.md @@ -0,0 +1,54 @@ +--- +sidebar_position: 3 +--- + +# Snowflake + +Anomstack supports Snowflake as a data source for your metrics. + +## Configuration + +Configure Snowflake in your metric batch's `config.yaml`: + +```yaml +db: "snowflake" +table_key: "YOUR_DATABASE.SCHEMA.TABLE" +metric_batch: "your_metric_batch_name" +ingest_cron_schedule: "*/60 * * * *" # When to run the ingestion +ingest_sql: > + select + current_timestamp() as metric_timestamp, + 'metric_name' as metric_name, + your_value as metric_value + from your_table; +``` + +## Authentication + +You can authenticate with Snowflake using: +- Username and password +- Key pair authentication +- OAuth +- Environment variables + +## Examples + +Check out the [Snowflake example](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/snowflake) for a complete working example. + +## Best Practices + +- Use appropriate warehouse sizing +- Consider query optimization +- Implement proper access controls +- Use parameterized queries + +## Limitations + +- Warehouse credits consumption +- Query timeout limits +- Concurrent query limits + +## Related Links + +- [Snowflake Documentation](https://docs.snowflake.com/) +- [Example Queries](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/snowflake) \ No newline at end of file diff --git a/docs/docs/data-sources/sqlite.md b/docs/docs/data-sources/sqlite.md new file mode 100644 index 00000000..86948135 --- /dev/null +++ b/docs/docs/data-sources/sqlite.md @@ -0,0 +1,70 @@ +--- +sidebar_position: 6 +--- + +# SQLite + +Anomstack supports SQLite as a data source for your metrics. SQLite is a lightweight, file-based database that's perfect for local development and small to medium-sized applications. + +## Configuration + +Configure SQLite in your metric batch's `config.yaml`: + +```yaml +db: "sqlite" +table_key: "metrics" # Default table to store metrics +metric_batch: "your_metric_batch_name" +ingest_cron_schedule: "*/3 * * * *" # When to run the ingestion +ingest_sql: > + select + datetime('now') as metric_timestamp, + 'metric_name' as metric_name, + your_value as metric_value + from your_table; +``` + +## Default Configuration + +Many configuration parameters can be set in `metrics/defaults/defaults.yaml` to apply across all metric batches. Key defaults include: + +```yaml +db: "duckdb" # Default database type +table_key: "metrics" # Default table name +ingest_cron_schedule: "*/3 * * * *" # Default ingestion schedule +model_path: "local://./models" # Default model storage location +alert_methods: "email,slack" # Default alert methods +``` + +You can override any of these defaults in your metric batch's configuration file. + +## Features + +SQLite provides: +- File-based database storage +- Zero configuration +- ACID compliance +- Full SQL support + +## Examples + +Check out the [SQLite example](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/sqlite) for a complete working example. + +## Best Practices + +- Regular database backups +- Proper file permissions +- Index optimization +- Query optimization + +## Limitations + +- Concurrent write operations +- File size limitations +- Memory constraints +- Network access limitations + +## Related Links + +- [SQLite Documentation](https://www.sqlite.org/docs.html) +- [Example Queries](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/sqlite) +- [Default Configuration](https://github.com/andrewm4894/anomstack/tree/main/metrics/defaults/defaults.yaml) \ No newline at end of file diff --git a/docs/docs/data-sources/turso.md b/docs/docs/data-sources/turso.md new file mode 100644 index 00000000..d7d2bf33 --- /dev/null +++ b/docs/docs/data-sources/turso.md @@ -0,0 +1,70 @@ +--- +sidebar_position: 8 +--- + +# Turso + +Anomstack supports Turso as a data source for your metrics. Turso is a distributed SQLite database that provides global replication and edge computing capabilities. + +## Configuration + +Configure Turso in your metric batch's `config.yaml`: + +```yaml +db: "turso" +table_key: "your_database.metrics" # Your Turso database and table +metric_batch: "your_metric_batch_name" +ingest_cron_schedule: "*/3 * * * *" # When to run the ingestion +ingest_sql: > + select + datetime('now') as metric_timestamp, + 'metric_name' as metric_name, + your_value as metric_value + from your_table; +``` + +## Default Configuration + +Many configuration parameters can be set in `metrics/defaults/defaults.yaml` to apply across all metric batches. Key defaults include: + +```yaml +db: "duckdb" # Default database type +table_key: "metrics" # Default table name +ingest_cron_schedule: "*/3 * * * *" # Default ingestion schedule +model_path: "local://./models" # Default model storage location +alert_methods: "email,slack" # Default alert methods +``` + +You can override any of these defaults in your metric batch's configuration file. + +## Features + +Turso provides: +- Global replication +- Edge computing +- SQLite compatibility +- Real-time sync + +## Examples + +Check out the [Turso example](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/turso) for a complete working example. + +## Best Practices + +- Token security +- Query optimization +- Replication strategy +- Data partitioning + +## Limitations + +- Query timeout limits +- Concurrent query limits +- Storage limitations +- Cost considerations + +## Related Links + +- [Turso Documentation](https://turso.tech/docs) +- [Example Queries](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/turso) +- [Default Configuration](https://github.com/andrewm4894/anomstack/tree/main/metrics/defaults/defaults.yaml) \ No newline at end of file diff --git a/docs/docs/features/dashboard.md b/docs/docs/features/dashboard.md index 19eafaa5..72ca5bab 100644 --- a/docs/docs/features/dashboard.md +++ b/docs/docs/features/dashboard.md @@ -34,7 +34,12 @@ Customize your dashboard through: ## Examples -Coming soon... +You can explore a live demo of the Anomstack dashboard at [https://anomstack-demo.replit.app/](https://anomstack-demo.replit.app/). This demo instance showcases various features including: +- Real-time metric monitoring +- Anomaly detection visualization +- Alert management +- Different metric batch views +- Configuration options ## Best Practices diff --git a/docs/docs/quickstart.md b/docs/docs/quickstart.md index a0d496ba..a2dd9f4c 100644 --- a/docs/docs/quickstart.md +++ b/docs/docs/quickstart.md @@ -80,6 +80,19 @@ python dashboard/app.py - [Customize the dashboard](features/dashboard) - [Explore deployment options](deployment/docker) +## Ready-Made Example Metrics + +Want to see Anomstack in action with real data? Try these ready-made example metric batches: + +- [Currency](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/currency): Track currency exchange rates from public APIs. +- [Yahoo Finance (yfinance)](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/yfinance): Monitor stock prices and financial data using the Yahoo Finance API. +- [Weather](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/weather): Analyze weather data from Open Meteo. +- [CoinDesk](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/coindesk): Get Bitcoin price data from the CoinDesk API. +- [Hacker News](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/hackernews): Track top stories and scores from Hacker News. +- [Netdata](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples/netdata): Monitor system metrics using the Netdata API. + +See the [full list of examples](https://github.com/andrewm4894/anomstack/tree/main/metrics/examples) for more, including BigQuery, Prometheus, Google Trends, and more. + ## Common Issues ### Metric Not Showing Up diff --git a/docs/sidebars.js b/docs/sidebars.js index 37ef4bf6..ca910eb4 100644 --- a/docs/sidebars.js +++ b/docs/sidebars.js @@ -43,8 +43,14 @@ const sidebars = { label: 'Data Sources', items: [ 'data-sources/python', - 'bigquery', - 'snowflake', + 'data-sources/bigquery', + 'data-sources/snowflake', + 'data-sources/clickhouse', + 'data-sources/duckdb', + 'data-sources/sqlite', + 'data-sources/motherduck', + 'data-sources/turso', + 'data-sources/redshift', ], }, { From 2226a6858837fb1b76e3fa9bde0fafc055a42527 Mon Sep 17 00:00:00 2001 From: andrewm4894 Date: Mon, 2 Jun 2025 23:22:33 +0100 Subject: [PATCH 4/4] Update documentation to reflect new deployment options - Replaced the links for Dagster Cloud and Local Development with a new link for GCP Deployment in `intro.md` to provide updated guidance for users. --- docs/docs/intro.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/docs/docs/intro.md b/docs/docs/intro.md index 3ecbab49..ac6f3c42 100644 --- a/docs/docs/intro.md +++ b/docs/docs/intro.md @@ -51,8 +51,7 @@ Choose your preferred way to get started: - [Quickstart Guide](quickstart) - [Docker Deployment](deployment/docker) -- [Dagster Cloud Setup](deployment/dagster-cloud) -- [Local Development](deployment/local) +- [GCP Deployment](deployment/gcp) ## Architecture