Skip to content

Conversation

@andrewm4894
Copy link
Owner

@andrewm4894 andrewm4894 commented Jul 19, 2025

Summary by CodeRabbit

  • New Features

    • Introduced hot configuration reload for Anomstack, enabling dynamic updates without restarting containers via manual command, scheduled job, or smart file watcher.
    • Added Makefile commands and scripts for hot reload, Docker resets, and Dagster storage cleanup with multiple cleanup levels and interactive menus.
    • Implemented new sensors and jobs for automated config reloads and file change detection.
    • Enhanced Docker Compose and configuration files to support live updates and improved storage management.
  • Bug Fixes

    • Adjusted default concurrency and retention settings to reduce resource usage and prevent excessive storage buildup.
  • Documentation

    • Added comprehensive guides for hot reload, storage optimization, Makefile usage, Docker reset, and script utilities.
    • Updated existing docs and README files with best practices, storage management tips, and new workflow examples.
  • Chores

    • Updated environment variable defaults and configuration files for lighter resource usage and better cleanup policies.
    • Improved Makefile with new utility targets for Docker, reset, cleanup, and documentation management.

- Reduce retention periods in dagster.yaml (3 days schedule, 1-3 days sensor runs)
- Lower concurrent run limits in dagster_docker.yaml (10 max concurrent, reduced tag limits)
- Add run monitoring with timeouts and telemetry disabled
- Update .example.env with lightweight storage defaults (tmp_light directory structure)
- Reduce worker counts and concurrency limits to prevent excessive storage

This addresses issues with 200K+ run directories and 63GB+ storage accumulation during development.
…nd documentation

Makefile improvements:
- Add reset operations (gentle, medium, nuclear, full-nuclear) with interactive mode
- Add Dagster cleanup targets (status, minimal, standard, aggressive)
- Expand documentation targets (install, start, build, serve, clear)
- All targets properly integrated with utility scripts

Documentation additions:
- Complete Makefile.md with detailed target documentation and usage examples
- Updated scripts/README.md with utilities documentation
- Clear categorization and quick reference for all available commands

This provides developers with comprehensive tooling for Docker management,
Dagster maintenance, and documentation workflow.
… references

Documentation additions:
- New storage-optimization.md with comprehensive guide on preventing Dagster storage buildup
- Covers root causes, solutions, monitoring, and maintenance best practices
- Added to sidebar navigation for easy access

Documentation improvements:
- Updated README.md with Makefile.md reference and quick command examples
- Enhanced docker deployment docs with storage management best practices
- Cross-referenced storage optimization guide from deployment documentation

These changes provide users with essential guidance for managing Dagster storage
issues and highlight the available developer tooling.
Update yarn.lock with latest documentation package dependencies.
Added essential utility scripts referenced by Makefile targets:

scripts/utils/reset_docker.sh:
- Interactive Docker reset utility with multiple cleanup levels
- Supports gentle, medium, nuclear, and full-nuclear reset options
- Comprehensive cleanup of containers, networks, volumes, and images
- Interactive mode with guided user prompts

scripts/utils/cleanup_dagster_storage.sh:
- Dagster storage cleanup utility with configurable retention policies
- Multiple cleanup levels: minimal, standard, aggressive
- Storage usage monitoring and reporting
- Batch cleanup operations for runs, logs, and artifacts

scripts/utils/README.md:
- Documentation for all utility scripts and usage patterns
- Examples and best practices for each cleanup level

These scripts provide the foundation for the Makefile's reset and cleanup targets.
Updated the README.md to include detailed descriptions and usage instructions for new utility scripts:
- `kill_long_running_tasks.py`: A Dagster utility for managing long-running jobs.
- SQLite utilities: `create_index.py`, `list_tables.py`, `list_indexes.py`, and `qry.py` for database performance optimization and inspection.

Added quick examples for each script to enhance user guidance and improve overall documentation clarity.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jul 19, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

The changes introduce a comprehensive hot reload and storage management system for Anomstack, including new scripts, Makefile targets, and documentation. Hot reload of configuration is now possible via manual commands, scheduled jobs, or a smart file watcher, all integrated with Dagster's API. Storage optimization is addressed through new retention policies, cleanup scripts, and detailed best practices documentation. Numerous utility scripts, Makefile enhancements, and Docker configuration updates support these features.

Changes

File(s) / Path(s) Change Summary
.example.env, dagster.yaml, dagster_docker.yaml Reduced concurrency and retention settings; added new config reload and watcher variables; updated storage paths; refined retention policies for schedules and sensors.
docker-compose.yaml, docker-compose.dev.yaml, docker/Dockerfile.anomstack_code Updated Docker Compose to mount metrics volume for hot reload; changed image build strategy; added Dagster host/port env vars; Dockerfile now uses dagster code-server start.
Makefile, Makefile.md Added targets for Docker management, config reload, auto-reload enabling, reset operations, Dagster cleanup, and expanded documentation commands; introduced detailed Makefile documentation file.
README.md, docs/sidebars.js Added Makefile usage tips and referenced new documentation; updated sidebar for new guides.
docs/docs/configuration/hot-reload.md, docs/docs/storage-optimization.md, docs/docs/deployment/docker.md Added comprehensive guides for hot reload and storage optimization; expanded Docker best practices.
anomstack/config.py, anomstack/jobs/reload.py, anomstack/sensors/config_reload.py, anomstack/main.py Implemented programmatic config reload via Dagster GraphQL API; added scheduled reload job and file watcher sensor; integrated new jobs and sensors into main definitions.
metrics/examples/python/python_ingest_simple/python_ingest_simple.yaml Adjusted cron schedules for ingestion and training jobs.
scripts/reload_config.py New script to trigger Dagster config reload via API, with health checks and validation.
scripts/utils/cleanup_dagster_storage.sh, scripts/utils/reset_docker.sh New scripts for interactive and automated Dagster storage cleanup and Docker environment reset at multiple levels.
scripts/utils/README.md, scripts/README.md Added/expanded documentation for reset and cleanup scripts, usage examples, and common scenarios.

Sequence Diagram(s)

Hot Reload Workflow via File Watcher

sequenceDiagram
    participant User
    participant Config File
    participant Sensor (config_file_watcher)
    participant Dagster API
    participant Code Server

    User->>Config File: Edit YAML config
    Sensor-->>Config File: Monitor for changes (hash)
    Sensor->>Sensor: Detects change (hash differs)
    Sensor->>Dagster API: POST reload_code_location mutation
    Dagster API->>Code Server: Reload code location
    Code Server-->>Dagster API: Reload result (success/failure)
    Dagster API-->>Sensor: Response
    Sensor->>Sensor: Update stored hash if success
Loading

Manual Config Reload via Script

sequenceDiagram
    participant User
    participant reload_config.py
    participant Dagster API
    participant Code Server

    User->>reload_config.py: Run script
    reload_config.py->>Dagster API: Check /server_info
    reload_config.py->>reload_config.py: Validate config files
    reload_config.py->>Dagster API: POST reload_code_location mutation
    Dagster API->>Code Server: Reload code location
    Code Server-->>Dagster API: Reload result
    Dagster API-->>reload_config.py: Response
    reload_config.py-->>User: Success/failure message
Loading

Scheduled Config Reload Job

sequenceDiagram
    participant Dagster Scheduler
    participant Config Reload Job
    participant Dagster API
    participant Code Server

    Dagster Scheduler->>Config Reload Job: Trigger per cron schedule
    Config Reload Job->>Dagster API: POST reload_code_location mutation
    Dagster API->>Code Server: Reload code location
    Code Server-->>Dagster API: Reload result
    Dagster API-->>Config Reload Job: Response
    Config Reload Job->>Dagster Scheduler: Log result
Loading

Poem

A bunny hops with nimble grace,
Hot reloads now in every place!
Storage trimmed, the logs are neat,
Makefile magic can't be beat.
With scripts and docs, the path is clear—
Anomstack's future: bright this year!
🐇✨

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 518d1fd and b38f0bf.

📒 Files selected for processing (1)
  • anomstack/config.py (2 hunks)
✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

- Add reload job for dynamic config updates
- Add config reload sensor to monitor changes
- Add reload script for manual config refresh
- Add documentation for hot reload feature
- Update config and main modules to support reloading
- Update documentation sidebar to include hot reload docs
- Update docker-compose files for dev and production
- Update Dagster Docker configuration
- Update anomstack code Dockerfile
- Update Makefile with new commands and improvements
- Update example environment configuration
- Improve Dagster storage cleanup script
- Update python_ingest_simple.yaml with latest configuration format
@andrewm4894 andrewm4894 marked this pull request as ready for review July 19, 2025 19:39
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🔭 Outside diff range comments (1)
.example.env (1)

128-128: Extraneous spaces break shell export parsing.

ANOMSTACK_DASHBOARD_PORT = 5001 contains a space before =.
Most .env loaders treat this as part of the variable name and fail to parse.

-ANOMSTACK_DASHBOARD_PORT = 5001
+ANOMSTACK_DASHBOARD_PORT=5001
♻️ Duplicate comments (1)
scripts/reload_config.py (1)

21-113: Well-implemented reload function with excellent user feedback.

The function provides comprehensive error handling and clear console output for different scenarios.

This function duplicates the logic in anomstack/config.py. Consider the refactoring suggestion made for that file to reduce code duplication.

🧹 Nitpick comments (21)
.example.env (2)

72-75: Comment values (“Reduced from …”) risk drifting – make them code, not docs.

Hard-coding previous values in comments becomes stale quickly. Prefer deriving from a single source (defaults file, make print-env, etc.) or remove the inline deltas.


145-147: Default combo (AUTO_CONFIG_RELOAD=false, CONFIG_WATCHER=true) is counter-intuitive.

If the goal is to reload only when files change, disable the watcher by default
and let users opt-in, otherwise the daemon polls forever on first run.

docker-compose.dev.yaml (1)

10-12: Expose DAGSTER_HOST/PORT via .env to avoid hard-coding service names.

When users override the webserver service name or port, these in-file literals
break hot-reload. Make them configurable:

environment:
  DAGSTER_HOST: ${DAGSTER_HOST:-anomstack_webserver}
  DAGSTER_PORT: ${DAGSTER_PORT:-3000}
README.md (1)

481-488: Minor markdown nit – align code block fence with list bullet.

Indenting the back-ticks by two spaces renders the block as plain text on some
parsers. Remove the leading spaces:

-```bash
+```bash
dagster.yaml (1)

33-35: Trailing whitespace violates yamllint and pollutes diffs

A minor style nit – lines 33-35 carry stray spaces after the values.

-      skipped: 1   # Keep only 1 day of skipped runs··
-      failure: 3   # Keep 3 days of failures for debugging··
-      success: 1   # Keep only 1 day of successful runs··
+      skipped: 1   # Keep only 1 day of skipped runs
+      failure: 3   # Keep 3 days of failures for debugging
+      success: 1   # Keep only 1 day of successful runs
dagster_docker.yaml (1)

59-67: Aggressive retention policies look appropriate for storage optimization.

The retention configuration effectively manages storage growth with reasonable timeframes for different run types.

Fix the trailing whitespace on line 66:

-      failure: 3   # Keep 3 days of failed runs for debugging  
+      failure: 3   # Keep 3 days of failed runs for debugging
scripts/README.md (1)

69-95: Well-documented Docker reset utility section.

The documentation clearly explains the different reset levels and safety features, which is important for such a potentially destructive tool.

Consider adding a subject to improve the sentence on line 78:

-Can be run interactively or with specific reset levels via Makefile targets.
+It can be run interactively or with specific reset levels via Makefile targets.
anomstack/config.py (2)

92-183: Well-implemented hot reload functionality with comprehensive error handling.

The GraphQL mutation approach and detailed response parsing provide robust configuration reload capabilities.

Consider extracting the common reload logic to avoid duplication with scripts/reload_config.py. The functions are nearly identical except for logging vs print statements:

# Common module approach
from .reload_common import reload_code_location_impl

def reload_code_location(location_name="anomstack_code"):
    return reload_code_location_impl(location_name, get_dagster_logger())

94-94: Move imports to top level for better performance.

The import requests inside the function causes repeated import overhead on each call.

Move the import to the top of the file:

 import os
 from pathlib import Path
 
 import yaml
 from dagster import get_dagster_logger
+import requests
scripts/reload_config.py (1)

10-11: Remove unused imports to clean up the code.

The json and time imports are not used in this script.

Remove the unused imports:

 import os
 import sys
-import json
-import time
 import requests
 from pathlib import Path
scripts/utils/README.md (1)

90-102: Good interactive mode documentation with clear example.

The example session helps users understand what to expect from the interactive interface.

Add language specification to the code block for better formatting:

-**Example Interactive Session:**
-```
+**Example Interactive Session:**
+```text
 🐳 Anomstack Docker Reset Tool
anomstack/sensors/config_reload.py (4)

9-9: Remove unused imports

Static analysis correctly identifies unused imports that should be removed for cleaner code:

-from typing import Dict, Optional
+from typing import Optional
-from dagster import (
-    RunRequest,
-    SensorEvaluationContext,
+from dagster import (
+    SensorEvaluationContext,

Also applies to: 12-12


26-63: LGTM: Robust hash calculation with good error handling

The hash calculation correctly combines file path, modification time, and content for comprehensive change detection. The error handling ensures the sensor continues operating even with file access issues.

Consider adding debug logging for file access errors to aid troubleshooting:

            except (OSError, IOError) as e:
                # File might have been deleted or is unreadable, skip it
+               logger = get_dagster_logger()
+               logger.debug(f"Skipping file {yaml_file} due to access error: {e}")
                hash_md5.update(f"ERROR:{e}".encode())

83-101: Consider simplifying the wrapper function

While the function works correctly, it's essentially a thin wrapper around execute_config_reload(). Consider whether the sensor-specific logging justifies the extra layer, or if you could call execute_config_reload() directly and enhance its logging instead.

The current implementation is functional and provides good traceability for sensor executions.


176-184: Consider consistent sensor naming

The disabled sensor uses a different name (config_file_watcher_disabled vs config_file_watcher), which could cause confusion. Consider using the same name with different behavior:

    @sensor(
-        name="config_file_watcher_disabled",
+        name="config_file_watcher",
        minimum_interval_seconds=3600,  # Check once per hour
    )
    def config_file_watcher(context: SensorEvaluationContext):

This ensures consistent sensor names regardless of enabled state.

Makefile (1)

35-35: Consider preventing duplicate environment variables

The new Docker operations are well-implemented, but the enable-* targets could create duplicate entries if run multiple times:

 enable-auto-reload:
 	@echo "🤖 Enabling automatic configuration reloading..."
-	@echo "ANOMSTACK_AUTO_CONFIG_RELOAD=true" >> .env
-	@echo "ANOMSTACK_CONFIG_RELOAD_STATUS=RUNNING" >> .env
+	@grep -q "ANOMSTACK_AUTO_CONFIG_RELOAD=" .env || echo "ANOMSTACK_AUTO_CONFIG_RELOAD=true" >> .env
+	@grep -q "ANOMSTACK_CONFIG_RELOAD_STATUS=" .env || echo "ANOMSTACK_CONFIG_RELOAD_STATUS=RUNNING" >> .env
 	@echo "✅ Auto reload enabled! Restart containers: make docker-restart"

This prevents duplicate entries while maintaining the same user experience.

Also applies to: 127-147

docs/docs/storage-optimization.md (1)

97-97: Remove the trailing colon from the heading.

Markdown-lint complains (MD026) and most renderers display the extra colon awkwardly.

-#### Cleanup Levels:
+#### Cleanup Levels
Makefile.md (2)

136-142: Specify a language for fenced code blocks.

Two blocks beginning at these lines use plain back-ticks without a language tag, triggering MD040 and losing syntax highlighting.

-```
+# bash
+```bash

Apply the same change to the following block a few lines below.


263-269: Convert bare URLs to Markdown links.

Bare links violate MD034 and reduce readability. Example fix:

-https://docs.docker.com/engine/reference/commandline/system_prune/
+[Docker system-prune docs](https://docs.docker.com/engine/reference/commandline/system_prune/)

Repeat for the other instance on line 268.

scripts/utils/cleanup_dagster_storage.sh (2)

15-24: Deduplicate common helpers across utility scripts.

print_* and confirm are byte-for-byte identical to the versions in reset_docker.sh. Factor them into a single scripts/utils/common.sh (and source it) to avoid future drift.


362-366: clear breaks non-interactive usage.

Running the script from cron or CI with no TTY will fail.
Guard the call:

-if tty -s; then clear; fi
+if tty -s; then clear; fi
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c8b7fb3 and ab43ef7.

⛔ Files ignored due to path filters (1)
  • docs/yarn.lock is excluded by !**/yarn.lock, !**/*.lock
📒 Files selected for processing (23)
  • .example.env (2 hunks)
  • Makefile (4 hunks)
  • Makefile.md (1 hunks)
  • README.md (1 hunks)
  • anomstack/config.py (1 hunks)
  • anomstack/jobs/reload.py (1 hunks)
  • anomstack/main.py (3 hunks)
  • anomstack/sensors/config_reload.py (1 hunks)
  • dagster.yaml (1 hunks)
  • dagster_docker.yaml (3 hunks)
  • docker-compose.dev.yaml (1 hunks)
  • docker-compose.yaml (4 hunks)
  • docker/Dockerfile.anomstack_code (2 hunks)
  • docs/docs/configuration/hot-reload.md (1 hunks)
  • docs/docs/deployment/docker.md (1 hunks)
  • docs/docs/storage-optimization.md (1 hunks)
  • docs/sidebars.js (1 hunks)
  • metrics/examples/python/python_ingest_simple/python_ingest_simple.yaml (1 hunks)
  • scripts/README.md (2 hunks)
  • scripts/reload_config.py (1 hunks)
  • scripts/utils/README.md (1 hunks)
  • scripts/utils/cleanup_dagster_storage.sh (1 hunks)
  • scripts/utils/reset_docker.sh (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (5)
anomstack/config.py (1)
scripts/reload_config.py (1)
  • reload_code_location (21-113)
anomstack/sensors/config_reload.py (1)
anomstack/config.py (1)
  • execute_config_reload (186-236)
anomstack/jobs/reload.py (1)
anomstack/config.py (1)
  • execute_config_reload (186-236)
scripts/utils/reset_docker.sh (1)
scripts/utils/cleanup_dagster_storage.sh (5)
  • print_info (16-16)
  • print_success (17-17)
  • confirm (22-40)
  • print_error (19-19)
  • usage (430-442)
scripts/utils/cleanup_dagster_storage.sh (1)
scripts/utils/reset_docker.sh (6)
  • print_info (16-16)
  • print_warning (18-18)
  • print_success (17-17)
  • confirm (22-40)
  • print_error (19-19)
  • usage (220-230)
🪛 YAMLlint (1.37.1)
dagster.yaml

[error] 33-33: trailing spaces

(trailing-spaces)

dagster_docker.yaml

[error] 66-66: trailing spaces

(trailing-spaces)

🪛 LanguageTool
scripts/README.md

[style] ~78-~78: To form a complete sentence, be sure to include a subject.
Context: ...reset + complete Docker system cleanup Can be run interactively or with specific r...

(MISSING_IT_THERE)

🪛 markdownlint-cli2 (0.17.2)
docs/docs/storage-optimization.md

97-97: Trailing punctuation in heading
Punctuation: ':'

(MD026, no-trailing-punctuation)

scripts/utils/README.md

91-91: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

Makefile.md

136-136: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


141-141: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


263-263: Bare URL used

(MD034, no-bare-urls)


268-268: Bare URL used

(MD034, no-bare-urls)

🪛 Ruff (0.12.2)
anomstack/sensors/config_reload.py

9-9: typing.Dict imported but unused

Remove unused import: typing.Dict

(F401)


12-12: dagster.RunRequest imported but unused

Remove unused import: dagster.RunRequest

(F401)

scripts/reload_config.py

10-10: json imported but unused

Remove unused import: json

(F401)


11-11: time imported but unused

Remove unused import: time

(F401)

🔇 Additional comments (34)
.example.env (1)

69-77: Ensure tmp_light/* directories exist in containers or switch to absolute paths

The tmp_light/* paths in .example.env (lines 69–77) won’t be present in a fresh container image, and we didn’t find any Dockerfiles or entrypoint scripts that create them. To prevent Dagster runtime failures:

Either update to an absolute, pre-existing path:

 # Docker-friendly defaults
-ANOMSTACK_DAGSTER_LOCAL_ARTIFACT_STORAGE_DIR=tmp_light/artifacts
+ANOMSTACK_DAGSTER_LOCAL_ARTIFACT_STORAGE_DIR=/opt/dagster/artifacts

Or ensure directory creation in your image build/entrypoint, for example:

mkdir -p tmp_light/{artifacts,compute_logs,storage}
chown dagster:dagster tmp_light/*

Please verify that these directories are created (and owned correctly) in your container environment.

metrics/examples/python/python_ingest_simple/python_ingest_simple.yaml (1)

4-5: Staggered cron values are good – check divisibility with downstream jobs.

ingest */9 / train */11 desynchronise nicely, but verify that
score_cron_schedule */6 still fires after an ingest has produced data and a
train has completed; otherwise score may run on stale data and generate empty
alerts. Consider a short buffer sensor instead of pure cron.

docs/sidebars.js (1)

70-79: Sidebar additions look correct.

No logical issues – new docs will render in the expected categories.

docs/docs/deployment/docker.md (1)

42-50: Check relative link path to the Storage Optimization guide

The new cross-reference points to ../storage-optimization.md, but the file was added under docs/docs/.... Please confirm the relative path renders correctly once the docs are built; broken links slip through CI easily.

anomstack/main.py (1)

13-20: Ensure the reload artefacts have unique Dagster names

reload_jobs, reload_schedules, and config_file_watcher will be merged into the global registry alongside many existing definitions. Dagster will raise a DagsterInvalidDefinitionError if any names collide. Double-check the underlying modules export unique name= arguments.

docker/Dockerfile.anomstack_code (1)

25-27: Verify gRPC health-check still succeeds with code-server

The container now starts dagster code-server, but the health-check still pings dagster api grpc-health-check. code-server should keep the gRPC endpoint alive, yet if it’s disabled in future versions the container will be marked unhealthy.

Please confirm the probe passes; if not, switch to the GraphQL /server_info curl used elsewhere.

dagster_docker.yaml (3)

9-16: LGTM! Concurrency reduction will help manage storage growth.

The reduced limits from 25→10 concurrent runs and tag-specific reductions are appropriate for preventing storage issues while maintaining reasonable throughput.


69-75: Run monitoring configuration provides good lifecycle management.

The timeout settings (5min start, 3min cancel, 1hr max runtime) with 1-minute polling provide effective stuck run detection and cleanup.


77-79: Good optimization to disable telemetry for reduced disk writes.

This aligns well with the storage optimization goals of this PR.

scripts/README.md (3)

11-24: Excellent documentation for the kill_long_running_tasks utility.

Clear description of features and usage instructions will help users understand this important maintenance tool.


29-67: Comprehensive SQLite utilities documentation.

The detailed sections for each utility script with usage examples provide excellent guidance for database management tasks.


109-135: Great addition of quick examples section.

The practical command-line examples will help users quickly accomplish common tasks without reading through all the detailed documentation.

anomstack/config.py (1)

186-236: Excellent validation wrapper for the reload functionality.

The pre-flight checks for Dagster accessibility and configuration files provide good user experience and prevent common failure scenarios.

scripts/reload_config.py (1)

146-180: Excellent main function with comprehensive validation and user guidance.

The script provides clear feedback, troubleshooting tips, and proper exit codes for automation use.

scripts/utils/README.md (3)

7-18: Excellent documentation structure for the Docker reset utility.

The clear feature list with emojis and descriptions provides users with a quick understanding of the utility's capabilities and safety features.


19-76: Comprehensive documentation of reset levels with clear use cases.

The progressive reset levels (gentle → medium → nuclear → full-nuclear) are well-explained with specific use cases and command examples.


104-158: Excellent safety documentation and usage guidance.

The sections on preserved files, safety features, and usage tips provide crucial information for users of this potentially destructive utility.

anomstack/jobs/reload.py (4)

1-19: LGTM: Clean imports and configuration

The imports are well-organized and the environment variable pattern with sensible defaults is consistent with the codebase.


82-95: LGTM: Clean configuration logic

The function correctly implements the enabling logic with proper boolean conversion and reasonable defaults for development mode.


98-128: LGTM: Well-structured conditional feature activation

The module-level conditional execution properly handles the optional nature of this feature. The environment variable handling with sensible defaults and clear logging provides good operational visibility.

The pattern of appending to reload_jobs and reload_schedules lists suggests these are consumed elsewhere in the application, which is a clean approach for modular feature management.


21-79: Defaults match Docker Compose configuration

Verified that the hardcoded defaults in reload_configuration (DAGSTER_HOST="anomstack_webserver" and DAGSTER_PORT="3000") align with the values defined in both docker-compose.yaml and docker-compose.dev.yaml. No changes required.

anomstack/sensors/config_reload.py (4)

21-24: LGTM: Clean utility function

The get_metrics_directory() function correctly uses Path.resolve() to get an absolute path, which is good practice for file operations.


66-80: LGTM: Clean cursor management utilities

The cursor management functions are well-designed with appropriate error handling for JSON operations and follow Dagster sensor patterns correctly.


103-117: LGTM: Consistent enabling logic

The should_enable_config_watcher() function correctly mirrors the approach used in the reload job, maintaining consistency across the configuration reload system.


123-175: LGTM: Well-structured sensor logic with smart cursor management

The sensor implementation correctly handles the baseline establishment, change detection, and reload workflow. The decision to only update the cursor on successful reload ensures failed reloads will be retried, which is excellent for reliability.

The comprehensive error handling and detailed logging provide good operational visibility.

docs/docs/configuration/hot-reload.md (1)

1-337: LGTM: Comprehensive and accurate documentation

This documentation excellently covers the hot reload system with:

  • ✅ Clear explanation of all three reload methods (manual, scheduled, smart file watcher)
  • ✅ Accurate environment variable configurations that match the code implementation
  • ✅ Proper distinction between what supports hot reload vs. requires container restart
  • ✅ Comprehensive troubleshooting section
  • ✅ Best practices for safe configuration management
  • ✅ Good examples and command references

The documentation structure is logical and user-friendly, making it easy for developers to understand and implement the hot reload capabilities effectively.

scripts/utils/reset_docker.sh (4)

1-40: LGTM: Solid script foundation with good practices

The script header follows bash best practices with set -e for error handling, robust directory detection, and well-structured utility functions. The confirm() function properly handles default values and user input validation.


42-76: LGTM: Excellent user safety and communication features

The data size reporting and preservation messaging provide crucial user awareness before destructive operations. The defensive programming with existence checks and error suppression prevents script failures while still providing useful information.


78-217: LGTM: Well-designed reset levels with excellent safety measures

The progressive reset levels with increasing destructiveness are well-implemented:

  • ✅ Appropriate warnings and confirmations for each level
  • ✅ Double confirmation for nuclear options
  • ✅ Comprehensive cleanup of relevant directories
  • ✅ Proper error handling with fallback commands
  • ✅ Good integration with Makefile targets
  • ✅ Defensive programming with existence checks before rm -rf

The safety measures effectively prevent accidental data loss while providing the necessary cleanup capabilities.


219-283: LGTM: Clean main function with comprehensive option handling

The main function provides both command-line and interactive interfaces effectively. The usage information is clear, error handling is appropriate, and the interactive menu makes the script accessible to users who prefer guided operation.

Makefile (3)

162-186: LGTM: Well-organized reset operations section

The reset operations are cleanly organized with clear descriptions and proper integration with the reset script. The interactive default option provides a user-friendly entry point while still supporting direct command-line usage.


188-212: LGTM: Comprehensive Dagster cleanup integration

The Dagster cleanup section provides excellent granular control over storage cleanup with clear descriptions and proper script integration. The status option is particularly valuable for understanding storage usage before cleanup.


262-286: LGTM: Comprehensive documentation workflow support

The enhanced documentation targets provide full lifecycle management for the Docusaurus site with clear, standard naming conventions. The backward-compatible alias for the docs target is thoughtful.

scripts/utils/cleanup_dagster_storage.sh (1)

133-141: Unsafe deletion when directory names contain spaces.

find … -exec rm -rf {} \; splits on whitespace. Use a null-delimited pipeline, mirroring the approach in the aggressive cleanup:

-find tmp/ -maxdepth 1 -type d -name "*-*-*-*-*" -mtime +30 -exec rm -rf {} \; 2>/dev/null || true
+find tmp/ -maxdepth 1 -type d -name "*-*-*-*-*" -mtime +30 -print0 2>/dev/null | xargs -0 rm -rf -- 2>/dev/null || true

Replicate the change in the nuclear cleanup block below.

Likely an incorrect or invalid review comment.

Comment on lines +143 to +151
# automatic configuration reloading
# Enable scheduled config reload job (runs every N minutes)
ANOMSTACK_AUTO_CONFIG_RELOAD=false
ANOMSTACK_CONFIG_RELOAD_CRON="*/5 * * * *"
ANOMSTACK_CONFIG_RELOAD_STATUS=STOPPED

# Enable smart config file watcher sensor (reloads only when files change)
ANOMSTACK_CONFIG_WATCHER=true
ANOMSTACK_CONFIG_WATCHER_INTERVAL=30
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Watcher variable name mismatch will disable the feature completely.

Code & docs previously referenced ANOMSTACK_CONFIG_WATCHER_POLL_INTERVAL,
but the file introduces ANOMSTACK_CONFIG_WATCHER_INTERVAL.

-ANOMSTACK_CONFIG_WATCHER_INTERVAL=30
+ANOMSTACK_CONFIG_WATCHER_POLL_INTERVAL=30   # keep in sync with anomstack/config.py

Search the repo for the constant to confirm.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# automatic configuration reloading
# Enable scheduled config reload job (runs every N minutes)
ANOMSTACK_AUTO_CONFIG_RELOAD=false
ANOMSTACK_CONFIG_RELOAD_CRON="*/5 * * * *"
ANOMSTACK_CONFIG_RELOAD_STATUS=STOPPED
# Enable smart config file watcher sensor (reloads only when files change)
ANOMSTACK_CONFIG_WATCHER=true
ANOMSTACK_CONFIG_WATCHER_INTERVAL=30
# automatic configuration reloading
# Enable scheduled config reload job (runs every N minutes)
ANOMSTACK_AUTO_CONFIG_RELOAD=false
ANOMSTACK_CONFIG_RELOAD_CRON="*/5 * * * *"
ANOMSTACK_CONFIG_RELOAD_STATUS=STOPPED
# Enable smart config file watcher sensor (reloads only when files change)
ANOMSTACK_CONFIG_WATCHER=true
ANOMSTACK_CONFIG_WATCHER_POLL_INTERVAL=30 # keep in sync with anomstack/config.py
🤖 Prompt for AI Agents
In the .example.env file around lines 143 to 151, the environment variable for
the watcher interval is named ANOMSTACK_CONFIG_WATCHER_INTERVAL, but the code
and documentation expect ANOMSTACK_CONFIG_WATCHER_POLL_INTERVAL. Rename
ANOMSTACK_CONFIG_WATCHER_INTERVAL to ANOMSTACK_CONFIG_WATCHER_POLL_INTERVAL to
match the expected variable name and ensure the watcher feature works correctly.
Verify by searching the repo for all references to this constant to confirm
consistency.

Comment on lines +38 to 41
build:
context: .
dockerfile: docker/Dockerfile.anomstack_code
container_name: anomstack_code
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

DAGSTER_CURRENT_IMAGE now points at a different image – runs may launch stale code

The service is locally built (build:) but DAGSTER_CURRENT_IMAGE is still hard-coded to the remote andrewm4894/anomstack_code:latest. Dagster will spin up new run containers from that tag, not the image you just built, defeating hot-reload consistency.

-      DAGSTER_CURRENT_IMAGE: "andrewm4894/anomstack_code:latest"
+#     Use the freshly-built image so launched runs match this container
+#     (Docker Compose will tag it as "<project>_anomstack_code").
+      DAGSTER_CURRENT_IMAGE: "${DAGSTER_CURRENT_IMAGE:-anomstack_code}"

Also applies to: 54-59

🤖 Prompt for AI Agents
In docker-compose.yaml around lines 38 to 41 and 54 to 59, the
DAGSTER_CURRENT_IMAGE environment variable is hard-coded to a remote image tag,
causing new run containers to use stale code instead of the locally built image.
Update DAGSTER_CURRENT_IMAGE to reference the locally built image by using the
container_name or build context image tag, ensuring that new runs launch with
the latest local build for hot-reload consistency.

Comment on lines +274 to +280
# Remove run directories older than 1 day
print_info "Removing run directories older than 24 hours..."
find tmp/ -maxdepth 1 -type d -name "*-*-*-*-*" -mtime +1 -exec rm -rf {} \; 2>/dev/null || true

# Remove all logs older than 1 day
find tmp/ -name "*.log" -mtime +1 -delete 2>/dev/null || true
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Same whitespace-safe deletion issue as above.

Apply the -print0 | xargs -0 pattern here as well to avoid accidental partial path deletion.

🤖 Prompt for AI Agents
In scripts/utils/cleanup_dagster_storage.sh around lines 274 to 280, the find
commands deleting run directories and log files use -exec and -delete directly,
which can cause issues with paths containing whitespace. Replace these with the
safer pattern using -print0 piped to xargs -0 rm -rf (for directories) and xargs
-0 rm (for files) to ensure paths with spaces are handled correctly and prevent
accidental partial deletions.

- Adjusted ingest, train, score, and alert cron schedules for improved timing:
  - Ingest schedule changed from "*/9 * * * *" to "*/10 * * * *"
  - Train schedule changed from "*/11 * * * *" to "*/15 * * * *"
  - Score schedule updated from "*/6 * * * *" to "*/11 * * * *"
  - Alert schedule modified from "*/5 * * * *" to "*/12 * * * *"
- Added ANOMSTACK_HOME variable to .example.env for home directory configuration.
- Updated dagster_docker.yaml to use ANOMSTACK_HOME for volume mounts, improving flexibility and portability.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
dagster_docker.yaml (1)

67-67: Strip trailing spaces (YAMLLint error).

Line 67 has two trailing spaces that violate the project’s lint rules.

-      failure: 3   # Keep 3 days of failed runs for debugging··
+      failure: 3   # Keep 3 days of failed runs for debugging
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f10ed33 and 518d1fd.

📒 Files selected for processing (2)
  • .example.env (3 hunks)
  • dagster_docker.yaml (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • .example.env
🧰 Additional context used
🪛 YAMLlint (1.37.1)
dagster_docker.yaml

[error] 67-67: trailing spaces

(trailing-spaces)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: test
🔇 Additional comments (3)
dagster_docker.yaml (3)

31-39: Verify ANOMSTACK_HOME availability & permissions.

ANOMSTACK_HOME is now propagated into launched containers and used for three host-path mounts.
If this variable is not set (or points to a non-absolute / non-existent path) Dagster will fail to start runs because Docker will refuse the bind mounts.

Please confirm that:

  1. ANOMSTACK_HOME is exported in every environment where dagster-daemon and dagit run (dev, CI, prod).
  2. The directory structure (tmp, dagster_home, metrics) exists and is readable / writable by the container UID.

A quick smoke test with docker run --rm -v "${ANOMSTACK_HOME}/tmp:/data busybox ls /data" is enough to catch mis-configuration early.


60-68: Retention section may not match Dagster’s expected schema.

The YAML keys look correct at first glance, but Dagster’s retention config is still marked “experimental” and the accepted field names changed twice in recent releases.
Make sure the current Dagster version recognises:

retention:
  schedule:
    purge_after_days: …
  sensor:
    purge_after_days:
      skipped: …
      failure: …
      success: …

Otherwise the daemon will silently ignore the block and disk usage will grow again.
Double-check with dagster debug print-config or start the daemon in verbose mode to see whether the policy is loaded.


70-76: Confirm cancel_timeout_seconds is supported.

run_monitoring definitely supports start_timeout_seconds, max_runtime_seconds, and poll_interval_seconds.
For the cancel timeout Dagster currently uses cancel_grace_period_seconds (at least up to 1.7.x). Using an unknown key would again be silently ignored.

Please validate against the Dagster version you pin in requirements*.txt or runtime image.
If the correct key is cancel_grace_period_seconds, patch as follows:

-  cancel_timeout_seconds: 180  # 3 minutes to cancel
+  cancel_grace_period_seconds: 180  # 3 minutes to cancel

- Updated the environment variable handling in get_specs to ensure overrides only occur if parameters are not specified in defaults or YAML files.
- Improved logging to reflect the changes made by environment variable overrides.
@andrewm4894 andrewm4894 merged commit 074f6fe into main Jul 20, 2025
3 of 4 checks passed
@andrewm4894 andrewm4894 deleted the general-dev-work branch July 20, 2025 11:27
@github-actions
Copy link

📊 Test Coverage Report

Coverage: 59% (yellow)

✅ Coverage maintained or improved!

💡 See detailed coverage report in the tests README

@coderabbitai coderabbitai bot mentioned this pull request Aug 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants