Skip to content

Commit d6bd9bf

Browse files
committed
feat: Update documentation and configuration for Anomstack
- Upgraded Docusaurus dependencies to version 3.8.1 for improved performance and features. - Enhanced the Quickstart guide with Docker setup instructions for easier deployment. - Added new sections for supported data sources and deployment options on the homepage. - Improved the feature list with detailed descriptions of Anomstack's capabilities. - Updated sidebar to include new configuration and utility scripts documentation. - Cleaned up package-lock and yarn.lock files to reflect the latest dependency versions. GIT_VALID_PII_OVERRIDE
1 parent d5d546c commit d6bd9bf

File tree

9 files changed

+8352
-6407
lines changed

9 files changed

+8352
-6407
lines changed
Lines changed: 326 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,326 @@
1+
---
2+
sidebar_position: 1
3+
---
4+
5+
# Environment Variables
6+
7+
This page documents all environment variables available in Anomstack. Copy the [`.example.env`](https://github.com/andrewm4894/anomstack/blob/main/.example.env) file to `.env` and configure the variables you need.
8+
9+
```bash
10+
cp .example.env .env
11+
```
12+
13+
## 🗄️ Database & Data Sources
14+
15+
### Google Cloud Platform
16+
Configure access to BigQuery and Google Cloud Storage.
17+
18+
| Variable | Required | Description | Example |
19+
|----------|----------|-------------|---------|
20+
| `ANOMSTACK_GOOGLE_APPLICATION_CREDENTIALS` | No | Path to GCP service account JSON file | `/path/to/credentials.json` |
21+
| `ANOMSTACK_GOOGLE_APPLICATION_CREDENTIALS_JSON` | No | GCP credentials as JSON string (alternative to file path) | `{"type": "service_account", ...}` |
22+
| `ANOMSTACK_GCP_PROJECT_ID` | No | Google Cloud Project ID for BigQuery | `my-project-123` |
23+
24+
### Snowflake
25+
Connect to Snowflake data warehouse.
26+
27+
| Variable | Required | Description | Example |
28+
|----------|----------|-------------|---------|
29+
| `ANOMSTACK_SNOWFLAKE_ACCOUNT` | No | Snowflake account identifier | `xy12345.us-east-1` |
30+
| `ANOMSTACK_SNOWFLAKE_USER` | No | Snowflake username | `anomstack_user` |
31+
| `ANOMSTACK_SNOWFLAKE_PASSWORD` | No | Snowflake password | `your-password` |
32+
| `ANOMSTACK_SNOWFLAKE_WAREHOUSE` | No | Snowflake warehouse name | `ANOMSTACK_WH` |
33+
34+
### AWS
35+
Connect to S3 and other AWS services.
36+
37+
| Variable | Required | Description | Example |
38+
|----------|----------|-------------|---------|
39+
| `ANOMSTACK_AWS_ACCESS_KEY_ID` | No | AWS access key ID | `AKIAIOSFODNN7EXAMPLE` |
40+
| `ANOMSTACK_AWS_SECRET_ACCESS_KEY` | No | AWS secret access key | `wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY` |
41+
42+
### ClickHouse
43+
Connect to ClickHouse database.
44+
45+
| Variable | Required | Description | Default | Example |
46+
|----------|----------|-------------|---------|---------|
47+
| `ANOMSTACK_CLICKHOUSE_HOST` | No | ClickHouse host | `localhost` | `clickhouse.example.com` |
48+
| `ANOMSTACK_CLICKHOUSE_PORT` | No | ClickHouse port | `8123` | `8123` |
49+
| `ANOMSTACK_CLICKHOUSE_USER` | No | ClickHouse username | `anomstack` | `admin` |
50+
| `ANOMSTACK_CLICKHOUSE_PASSWORD` | No | ClickHouse password | `anomstack` | `your-password` |
51+
| `ANOMSTACK_CLICKHOUSE_DATABASE` | No | ClickHouse database | `default` | `metrics` |
52+
53+
### MotherDuck & Turso
54+
Enhanced DuckDB and SQLite services.
55+
56+
| Variable | Required | Description | Example |
57+
|----------|----------|-------------|---------|
58+
| `ANOMSTACK_MOTHERDUCK_TOKEN` | No | MotherDuck authentication token | `your-motherduck-token` |
59+
| `ANOMSTACK_TURSO_DATABASE_URL` | No | Turso database URL | `libsql://your-db.turso.io` |
60+
| `ANOMSTACK_TURSO_AUTH_TOKEN` | No | Turso authentication token | `your-turso-token` |
61+
62+
## 💾 Storage Configuration
63+
64+
### Database Paths
65+
Configure where metrics and metadata are stored.
66+
67+
| Variable | Required | Description | Docker Default | Local Default |
68+
|----------|----------|-------------|---------------|---------------|
69+
| `ANOMSTACK_DUCKDB_PATH` | No | DuckDB database path | `/metrics_db/duckdb/anomstack.db` | `tmpdata/anomstack-duckdb.db` |
70+
| `ANOMSTACK_SQLITE_PATH` | No | SQLite database path | `tmpdata/anomstack-sqlite.db` | `tmpdata/anomstack-sqlite.db` |
71+
| `ANOMSTACK_TABLE_KEY` | No | Table identifier for metrics | `tmp.metrics` | `production.metrics` |
72+
73+
### Model Storage
74+
Configure where trained ML models are stored.
75+
76+
| Variable | Required | Description | Examples |
77+
|----------|----------|-------------|----------|
78+
| `ANOMSTACK_MODEL_PATH` | No | Model storage location | `local://./tmp/models`<br/>`gs://your-bucket/models`<br/>`s3://your-bucket/models` |
79+
80+
**Storage Options:**
81+
- **Local**: `local://./tmp/models` (default)
82+
- **Google Cloud Storage**: `gs://your-bucket/models`
83+
- **AWS S3**: `s3://your-bucket/models`
84+
85+
### Application Paths
86+
Internal directory configuration.
87+
88+
| Variable | Required | Description | Default |
89+
|----------|----------|-------------|---------|
90+
| `ANOMSTACK_HOME` | No | Home directory for Anomstack | `.` (current directory) |
91+
92+
## 📧 Alert Configuration
93+
94+
### Email Alerts
95+
Configure email notifications for anomalies.
96+
97+
| Variable | Required | Description | Default | Example |
98+
|----------|----------|-------------|---------|---------|
99+
| `ANOMSTACK_ALERT_EMAIL_FROM` | No | Sender email address | | `[email protected]` |
100+
| `ANOMSTACK_ALERT_EMAIL_TO` | No | Recipient email address | | `[email protected]` |
101+
| `ANOMSTACK_ALERT_EMAIL_SMTP_HOST` | No | SMTP server host | `smtp.gmail.com` | `smtp.office365.com` |
102+
| `ANOMSTACK_ALERT_EMAIL_SMTP_PORT` | No | SMTP server port | `587` | `25` |
103+
| `ANOMSTACK_ALERT_EMAIL_PASSWORD` | No | Email password/app token | | `your-app-password` |
104+
105+
### Failure Email Alerts
106+
Separate email configuration for job failures.
107+
108+
| Variable | Required | Description | Example |
109+
|----------|----------|-------------|---------|
110+
| `ANOMSTACK_FAILURE_EMAIL_FROM` | No | Sender for failure alerts | `[email protected]` |
111+
| `ANOMSTACK_FAILURE_EMAIL_TO` | No | Recipient for failure alerts | `[email protected]` |
112+
| `ANOMSTACK_FAILURE_EMAIL_SMTP_HOST` | No | SMTP host for failures | `smtp.gmail.com` |
113+
| `ANOMSTACK_FAILURE_EMAIL_SMTP_PORT` | No | SMTP port for failures | `587` |
114+
| `ANOMSTACK_FAILURE_EMAIL_PASSWORD` | No | Email password for failures | `your-app-password` |
115+
116+
### Slack Alerts
117+
Configure Slack notifications.
118+
119+
| Variable | Required | Description | Example |
120+
|----------|----------|-------------|---------|
121+
| `ANOMSTACK_SLACK_BOT_TOKEN` | No | Slack bot token | `xoxb-your-bot-token` |
122+
| `ANOMSTACK_SLACK_CHANNEL` | No | Slack channel for alerts | `#anomaly-alerts` |
123+
124+
## 🤖 LLM Integration
125+
126+
### OpenAI
127+
Configure AI-powered anomaly detection and alerts.
128+
129+
| Variable | Required | Description | Default | Example |
130+
|----------|----------|-------------|---------|---------|
131+
| `ANOMSTACK_OPENAI_KEY` | No | OpenAI API key | | `sk-...` |
132+
| `OPENAI_API_KEY` | No | Alternative OpenAI API key | | `sk-...` |
133+
| `ANOMSTACK_OPENAI_MODEL` | No | OpenAI model to use | `gpt-4o-mini` | `gpt-4o` |
134+
135+
### Anthropic
136+
Alternative LLM provider.
137+
138+
| Variable | Required | Description | Default | Example |
139+
|----------|----------|-------------|---------|---------|
140+
| `ANOMSTACK_ANTHROPIC_KEY` | No | Anthropic API key | | `sk-ant-...` |
141+
| `ANOMSTACK_ANTHROPIC_MODEL` | No | Anthropic model | `claude-3-haiku-20240307` | `claude-3-sonnet-20240229` |
142+
143+
### LLM Platform Selection
144+
145+
| Variable | Required | Description | Default | Options |
146+
|----------|----------|-------------|---------|---------|
147+
| `ANOMSTACK_LLM_PLATFORM` | No | Which LLM provider to use | `openai` | `openai`, `anthropic` |
148+
149+
### LangSmith Tracing
150+
Optional LLM call tracing and monitoring.
151+
152+
| Variable | Required | Description | Default | Example |
153+
|----------|----------|-------------|---------|---------|
154+
| `LANGSMITH_TRACING` | No | Enable LangSmith tracing | `true` | `false` |
155+
| `LANGSMITH_ENDPOINT` | No | LangSmith API endpoint | `https://api.smith.langchain.com` | |
156+
| `LANGSMITH_API_KEY` | No | LangSmith API key | | `your-api-key` |
157+
| `LANGSMITH_PROJECT` | No | LangSmith project name | `anomaly-agent` | `your-project` |
158+
159+
## ⚙️ Dagster Configuration
160+
161+
### Core Dagster Settings
162+
163+
| Variable | Required | Description | Default | Example |
164+
|----------|----------|-------------|---------|---------|
165+
| `DAGSTER_LOG_LEVEL` | No | Dagster logging level | `DEBUG` | `INFO`, `WARNING`, `ERROR` |
166+
| `DAGSTER_CONCURRENCY` | No | Number of concurrent jobs | `4` | `8` |
167+
168+
### Dagster Directories
169+
Lightweight defaults to prevent disk space issues.
170+
171+
| Variable | Required | Description | Default |
172+
|----------|----------|-------------|---------|
173+
| `ANOMSTACK_DAGSTER_LOCAL_ARTIFACT_STORAGE_DIR` | No | Artifacts storage directory | `tmp_light/artifacts` |
174+
| `ANOMSTACK_DAGSTER_OVERALL_CONCURRENCY_LIMIT` | No | Overall concurrency limit | `5` |
175+
| `ANOMSTACK_DAGSTER_DEQUEUE_USE_THREADS` | No | Use threads for dequeuing | `false` |
176+
| `ANOMSTACK_DAGSTER_DEQUEUE_NUM_WORKERS` | No | Number of dequeue workers | `2` |
177+
| `ANOMSTACK_DAGSTER_LOCAL_COMPUTE_LOG_MANAGER_DIRECTORY` | No | Compute logs directory | `tmp_light/compute_logs` |
178+
| `ANOMSTACK_DAGSTER_SQLITE_STORAGE_BASE_DIR` | No | SQLite storage base directory | `tmp_light/storage` |
179+
180+
### Job Timeout Configuration
181+
182+
| Variable | Required | Description | Default | Example |
183+
|----------|----------|-------------|---------|---------|
184+
| `ANOMSTACK_MAX_RUNTIME_SECONDS_TAG` | No | Max job runtime in seconds | `3600` | `7200` |
185+
| `ANOMSTACK_KILL_RUN_AFTER_MINUTES` | No | Kill long-running jobs after N minutes | `60` | `120` |
186+
187+
## 🐳 Docker & Deployment
188+
189+
### PostgreSQL (Docker)
190+
Database for Dagster metadata when using Docker.
191+
192+
| Variable | Required | Description | Default |
193+
|----------|----------|-------------|---------|
194+
| `ANOMSTACK_POSTGRES_USER` | No | PostgreSQL username | `postgres_user` |
195+
| `ANOMSTACK_POSTGRES_PASSWORD` | No | PostgreSQL password | `postgres_password` |
196+
| `ANOMSTACK_POSTGRES_DB` | No | PostgreSQL database name | `postgres_db` |
197+
| `ANOMSTACK_POSTGRES_FORWARD_PORT` | No | Local port forwarding | `5432` (leave blank to disable) |
198+
199+
### Dashboard Configuration
200+
201+
| Variable | Required | Description | Default |
202+
|----------|----------|-------------|---------|
203+
| `ANOMSTACK_DASHBOARD_PORT` | No | Dashboard port | `5001` |
204+
205+
## 🔧 Advanced Configuration
206+
207+
### Example Metrics
208+
209+
| Variable | Required | Description | Default | Options |
210+
|----------|----------|-------------|---------|---------|
211+
| `ANOMSTACK_IGNORE_EXAMPLES` | No | Ignore example metrics | `no` | `yes`, `no` |
212+
213+
### Auto-Reload Configuration
214+
Automatically reload configuration when files change.
215+
216+
| Variable | Required | Description | Default | Example |
217+
|----------|----------|-------------|---------|---------|
218+
| `ANOMSTACK_AUTO_CONFIG_RELOAD` | No | Enable scheduled config reload | `false` | `true` |
219+
| `ANOMSTACK_CONFIG_RELOAD_CRON` | No | Config reload schedule | `*/5 * * * *` | `*/10 * * * *` |
220+
| `ANOMSTACK_CONFIG_RELOAD_STATUS` | No | Config reload job status | `STOPPED` | `RUNNING` |
221+
| `ANOMSTACK_CONFIG_WATCHER` | No | Enable smart file watcher | `true` | `false` |
222+
| `ANOMSTACK_CONFIG_WATCHER_INTERVAL` | No | File watcher check interval (seconds) | `30` | `60` |
223+
224+
### Analytics
225+
226+
| Variable | Required | Description | Example |
227+
|----------|----------|-------------|---------|
228+
| `POSTHOG_API_KEY` | No | PostHog analytics API key | `phc_...` |
229+
230+
## 🎛️ Per-Metric Batch Overrides
231+
232+
You can override any configuration parameter for specific metric batches using environment variables:
233+
234+
```bash
235+
ANOMSTACK__<METRIC_BATCH>__<PARAMETER>=<VALUE>
236+
```
237+
238+
**Format Rules:**
239+
- `<METRIC_BATCH>`: Uppercase metric batch name with dashes replaced by underscores
240+
- `<PARAMETER>`: Uppercase parameter name with underscores
241+
242+
**Examples:**
243+
```bash
244+
# Override database for python_ingest_simple metric batch
245+
ANOMSTACK__PYTHON_INGEST_SIMPLE__DB=bigquery
246+
247+
# Override alert methods
248+
ANOMSTACK__PYTHON_INGEST_SIMPLE__ALERT_METHODS=email
249+
250+
# Override schedule
251+
ANOMSTACK__PYTHON_INGEST_SIMPLE__INGEST_CRON_SCHEDULE="*/1 * * * *"
252+
253+
# Enable specific job schedules
254+
ANOMSTACK__PYTHON_INGEST_SIMPLE__INGEST_DEFAULT_SCHEDULE_STATUS=RUNNING
255+
ANOMSTACK__PYTHON_INGEST_SIMPLE__TRAIN_DEFAULT_SCHEDULE_STATUS=RUNNING
256+
ANOMSTACK__PYTHON_INGEST_SIMPLE__SCORE_DEFAULT_SCHEDULE_STATUS=RUNNING
257+
ANOMSTACK__PYTHON_INGEST_SIMPLE__ALERT_DEFAULT_SCHEDULE_STATUS=RUNNING
258+
```
259+
260+
This allows you to configure different metric batches differently without modifying YAML files.
261+
262+
## 📝 Common Configuration Patterns
263+
264+
### Development Setup
265+
```bash
266+
# Use local storage
267+
ANOMSTACK_DUCKDB_PATH=tmpdata/anomstack-duckdb.db
268+
ANOMSTACK_MODEL_PATH=local://./tmp/models
269+
ANOMSTACK_IGNORE_EXAMPLES=no
270+
271+
# Basic email alerts
272+
273+
274+
```
275+
276+
### Production Setup
277+
```bash
278+
# Use cloud storage
279+
ANOMSTACK_MODEL_PATH=gs://company-anomstack/models
280+
ANOMSTACK_DUCKDB_PATH=/metrics_db/duckdb/anomstack.db
281+
282+
# Production alerts
283+
284+
285+
ANOMSTACK_SLACK_CHANNEL=#production-alerts
286+
287+
# Disable examples
288+
ANOMSTACK_IGNORE_EXAMPLES=yes
289+
```
290+
291+
### BigQuery + GCS Setup
292+
```bash
293+
# BigQuery connection
294+
ANOMSTACK_GCP_PROJECT_ID=your-project-id
295+
ANOMSTACK_GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json
296+
297+
# Use GCS for model storage
298+
ANOMSTACK_MODEL_PATH=gs://your-bucket/models
299+
300+
# BigQuery table for metrics
301+
ANOMSTACK_TABLE_KEY=your_dataset.metrics
302+
```
303+
304+
## 🔐 Security Best Practices
305+
306+
1. **Use environment files**: Never commit `.env` files with secrets to version control
307+
2. **Rotate credentials**: Regularly rotate API keys and passwords
308+
3. **Least privilege**: Use service accounts with minimal required permissions
309+
4. **Secrets management**: Consider using proper secrets management in production (AWS Secrets Manager, Google Secret Manager, etc.)
310+
5. **File permissions**: Restrict access to your `.env` file (`chmod 600 .env`)
311+
312+
## 🆘 Troubleshooting
313+
314+
**Environment not loading?**
315+
- Ensure `.env` file exists in the project root
316+
- Check file permissions and syntax
317+
- Verify no extra spaces around `=` signs
318+
319+
**Docker not picking up changes?**
320+
- Restart containers: `make docker-stop && make docker`
321+
- Check if environment is properly mounted
322+
323+
**Database connection issues?**
324+
- Verify credentials and network access
325+
- Test connections independently
326+
- Check firewall and VPN settings

0 commit comments

Comments
 (0)