-
Notifications
You must be signed in to change notification settings - Fork 12
Disk cleanup #191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disk cleanup #191
Conversation
- Reduce schedule run retention to 1 day (was 2 days) - Set successful runs to 6 hours retention (was 1 day) - Set skipped runs to 12 hours retention (was 1 day) - Keep failed runs at 1 day for debugging This addresses the root cause of disk space issues by reducing the time artifacts are kept on the limited Fly.io volume.
- New cleanup.py with comprehensive disk management operations - Removes artifacts older than 6 hours automatically - Cleans log files older than 24 hours - Removes database metrics older than 90 days with VACUUM - Reports disk usage before/after cleanup operations - Scheduled to run every 2 hours automatically - Integrated into main Dagster pipeline This prevents disk space issues by automatically maintaining the Fly.io volume with regular cleanup operations.
- New cleanup_disk_space.py script for manual/emergency cleanup - Supports --dry-run mode for safe preview of cleanup actions - Supports --aggressive mode for emergency situations - Detailed reporting of disk usage and cleanup results - Makefile commands for easy Fly.io cleanup: - make fly-cleanup-preview: safe preview mode - make fly-cleanup: normal cleanup - make fly-cleanup-aggressive: emergency cleanup Provides manual tools for disk space management alongside the automated cleanup job for maximum flexibility.
- Add comprehensive cleanup_disk_space.py documentation to maintenance README - Document new Fly.io cleanup Makefile commands in Makefile.md - Update test expectations to account for new cleanup job/schedule - Include usage examples and safety features for all cleanup tools Provides complete documentation for the disk space management solution including automated and manual cleanup options.
|
""" WalkthroughThis update introduces a new disk space cleanup system for Fly.io instances, including a standalone Python script, new Makefile targets, Dagster job and schedule definitions, and corresponding documentation. Retention policies are tightened, and the main application temporarily disables integration of the cleanup job and schedule. Tests are updated with comments reflecting this temporary disablement. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Makefile
participant Fly.io Instance
participant CleanupScript
User->>Makefile: make fly-cleanup[-preview|-aggressive]
Makefile->>Fly.io Instance: SSH & run cleanup_disk_space.py [flags]
Fly.io Instance->>CleanupScript: Execute with appropriate mode
CleanupScript->>Fly.io Instance: Clean artifacts, logs, DB, report disk usage
Fly.io Instance-->>User: Output cleanup summary
sequenceDiagram
participant DagsterScheduler
participant CleanupJob
participant Artifacts
participant Logs
participant MetricsDB
DagsterScheduler->>CleanupJob: Trigger on schedule
CleanupJob->>Artifacts: Remove old artifacts
CleanupJob->>Logs: Remove old logs
CleanupJob->>MetricsDB: Delete old metrics, VACUUM
CleanupJob->>DagsterScheduler: Report disk usage before/after
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~15–20 minutes Poem
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. ✨ Finishing Touches
🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
📊 Test Coverage ReportCoverage: 58% (yellow) ✅ Coverage maintained or improved!
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (13)
Makefile.md (3)
568-568: Fix markdown heading format.Use proper heading syntax instead of emphasis for better document structure.
-#### `make fly-cleanup-preview` +### `make fly-cleanup-preview`
579-579: Fix markdown heading format.Use proper heading syntax instead of emphasis for better document structure.
-#### `make fly-cleanup` +### `make fly-cleanup`
591-591: Fix markdown heading format.Use proper heading syntax instead of emphasis for better document structure.
-#### `make fly-cleanup-aggressive` +### `make fly-cleanup-aggressive`anomstack/jobs/cleanup.py (6)
9-11: Remove unused imports.Clean up the unused imports flagged by static analysis.
-from pathlib import Path - -from dagster import DefaultScheduleStatus, JobDefinition, ScheduleDefinition, job, op, get_dagster_logger +from dagster import DefaultScheduleStatus, ScheduleDefinition, job, op, get_dagster_logger
71-71: Fix unused loop variable.Rename the unused
dirsvariable to follow Python conventions.- for root, dirs, files in os.walk(log_dir): + for root, _dirs, files in os.walk(log_dir):
19-19: Consider making the artifacts path configurable.The hardcoded path
/data/artifacts/storagemight not be suitable for all environments. Consider making this configurable through environment variables or Dagster configuration.artifacts_path = os.getenv("DAGSTER_ARTIFACTS_PATH", "/data/artifacts/storage")
38-48: Potential performance issue with directory size calculation.The nested loop for calculating directory sizes could be slow for large directories. Consider using
shutil.disk_usage()or implementing a timeout mechanism for very large directories.try: - size = sum( - os.path.getsize(os.path.join(dirpath, filename)) - for dirpath, dirnames, filenames in os.walk(item_path) - for filename in filenames - ) + # Use a more efficient approach for large directories + size = 0 + for dirpath, dirnames, filenames in os.walk(item_path): + size += sum(os.path.getsize(os.path.join(dirpath, filename)) + for filename in filenames if os.path.exists(os.path.join(dirpath, filename))) shutil.rmtree(item_path)
76-76: Improve timestamp comparison for cross-platform compatibility.Using
datetime.fromtimestamp()directly may have timezone issues. Consider using UTC timestamps for consistency.- if os.path.getmtime(file_path) < (datetime.now() - timedelta(hours=24)).timestamp(): + file_time = datetime.fromtimestamp(os.path.getmtime(file_path)) + cutoff_time = datetime.now() - timedelta(hours=24) + if file_time < cutoff_time:
179-183: Schedule frequency might be too aggressive.Running cleanup every 2 hours could be resource-intensive, especially the database VACUUM operation. Consider if this frequency is necessary or if it should be configurable.
Consider making the schedule configurable or reducing frequency:
- cron_schedule="0 */2 * * *", # Every 2 hours + cron_schedule="0 */6 * * *", # Every 6 hoursscripts/maintenance/cleanup_disk_space.py (4)
19-21: Remove unused imports.Clean up the unused imports flagged by static analysis.
-import sys -from datetime import datetime, timedelta -from pathlib import Path +from datetime import datetime, timedelta
119-119: Fix unused loop variable.Rename the unused
dirsvariable to follow Python conventions.- for root, dirs, files in os.walk(log_dir): + for root, _dirs, files in os.walk(log_dir):
70-74: Potential performance issue with large directories.Similar to the Dagster job, the directory size calculation could be slow for very large directories. Consider adding progress feedback or implementing a timeout.
Consider adding progress feedback for large directories:
# Add progress tracking for large directories file_count = sum(len(filenames) for _, _, filenames in os.walk(item_path)) if file_count > 1000: # Large directory print(f" Calculating size for large directory ({file_count} files)...")
43-191: Consider consolidating cleanup logic with Dagster job.There's significant code duplication between this script and the Dagster job implementation. Consider extracting common cleanup logic into a shared module to maintain consistency and reduce duplication.
Create a shared cleanup utilities module that both implementations can use:
# cleanup_utils.py def cleanup_artifacts_common(path, hours_back, dry_run=False): # Common cleanup logic pass def cleanup_logs_common(log_dirs, hours_back, aggressive=False, dry_run=False): # Common cleanup logic pass
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
Makefile(1 hunks)Makefile.md(1 hunks)anomstack/jobs/cleanup.py(1 hunks)anomstack/main.py(3 hunks)dagster_fly.yaml(1 hunks)scripts/maintenance/README.md(1 hunks)scripts/maintenance/cleanup_disk_space.py(1 hunks)tests/test_main.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
scripts/maintenance/README.md (1)
scripts/utils/cleanup_dagster_storage.sh (7)
usage(430-442)show_cleanup_options(78-105)cleanup_aggressive(151-266)main_menu(361-427)cleanup_standard(128-149)analyze_storage(43-75)cleanup_cli_based(299-324)
scripts/maintenance/cleanup_disk_space.py (1)
scripts/utils/cleanup_dagster_storage.sh (6)
usage(430-442)cleanup_aggressive(151-266)show_cleanup_options(78-105)cleanup_cli_based(299-324)cleanup_standard(128-149)analyze_storage(43-75)
🪛 Ruff (0.12.2)
scripts/maintenance/cleanup_disk_space.py
19-19: sys imported but unused
Remove unused import: sys
(F401)
21-21: pathlib.Path imported but unused
Remove unused import: pathlib.Path
(F401)
119-119: Loop control variable dirs not used within loop body
Rename unused dirs to _dirs
(B007)
anomstack/jobs/cleanup.py
9-9: pathlib.Path imported but unused
Remove unused import: pathlib.Path
(F401)
11-11: dagster.JobDefinition imported but unused
Remove unused import: dagster.JobDefinition
(F401)
71-71: Loop control variable dirs not used within loop body
Rename unused dirs to _dirs
(B007)
🪛 markdownlint-cli2 (0.17.2)
Makefile.md
568-568: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
579-579: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
591-591: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: test
🔇 Additional comments (11)
anomstack/main.py (2)
9-9: LGTM! Clean integration of cleanup module.The import follows the established pattern used by other job modules in the codebase.
33-33: LGTM! Consistent integration pattern.The cleanup jobs and schedules are properly appended to the existing tuples, maintaining consistency with how other modules are integrated.
Also applies to: 47-47
dagster_fly.yaml (1)
35-40: Verify impact of aggressive retention policies on operational visibility.The retention periods have been significantly reduced, particularly for successful runs (6 hours) and skipped runs (12 hours). While this supports the disk cleanup objectives, consider whether these short retention windows might impact:
- Debugging capabilities for recent issues
- Historical analysis and trend monitoring
- Compliance or audit requirements
Consider testing these settings in a development environment first to ensure they don't hinder operational visibility.
tests/test_main.py (1)
12-12: LGTM! Test updates correctly reflect cleanup integration.The test assertions have been properly updated to account for the new cleanup job and schedule. The comments clearly explain the reasoning behind the calculation changes (2 non-metric jobs total including cleanup).
Also applies to: 16-16, 20-20, 24-24
Makefile (1)
419-437: LGTM! Well-implemented Fly.io cleanup targets.The new Makefile targets are properly structured with:
- Appropriate error handling for missing
FLY_APPenvironment variable- Clear informational messages for each operation
- Correct SSH command structure for remote execution
The naming convention and functionality align well with existing Fly.io targets.
scripts/maintenance/README.md (1)
42-85: LGTM! Comprehensive and well-structured documentation.The new documentation section for
cleanup_disk_space.pyis excellent:
- Clear feature descriptions and use cases
- Practical usage examples with different modes
- Detailed cleanup targets and safety considerations
- Consistent formatting with existing documentation
This provides users with all necessary information to effectively use the cleanup functionality.
Makefile.md (1)
565-601: Well-documented disk cleanup functionality.The documentation provides clear usage examples and explains the different cleanup modes effectively. The retention periods and behavior are well-explained, helping users choose the appropriate cleanup level.
anomstack/jobs/cleanup.py (1)
96-124: Database cleanup implementation looks solid.The SQLite database cleanup with VACUUM operation is well-implemented. Good error handling and the approach of counting records before deletion provides useful logging information.
scripts/maintenance/cleanup_disk_space.py (3)
43-94: Good implementation with comprehensive error handling.The artifact cleanup function is well-structured with proper error handling, clear output, and support for both dry-run and aggressive modes. The size calculation and progress reporting are helpful for users.
147-191: Database cleanup with good dry-run support.The database cleanup function properly implements dry-run mode and provides clear feedback about what would be deleted. The VACUUM operation is correctly placed after the deletion.
255-258: Good warning for high disk usage.The warning when disk usage remains above 90% after cleanup is helpful for users to understand they may need additional action.
| try: | ||
| for root, dirs, files in os.walk(log_dir): | ||
| for file in files: | ||
| if file.endswith(('.log', '.out', '.err')) or 'dagster' in file.lower(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Broad file matching pattern could be risky.
The condition 'dagster' in file.lower() is quite broad and might match unintended files. Consider being more specific about which files to target.
- if file.endswith(('.log', '.out', '.err')) or 'dagster' in file.lower():
+ if (file.endswith(('.log', '.out', '.err')) or
+ file.startswith('dagster') or
+ 'dagster.log' in file.lower()):📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if file.endswith(('.log', '.out', '.err')) or 'dagster' in file.lower(): | |
| if (file.endswith(('.log', '.out', '.err')) or | |
| file.startswith('dagster') or | |
| 'dagster.log' in file.lower()): |
🤖 Prompt for AI Agents
In scripts/maintenance/cleanup_disk_space.py at line 121, the condition checking
if 'dagster' is in the filename is too broad and may match unintended files.
Refine this condition to target specific filenames or patterns related to
dagster logs or outputs, such as checking for exact filenames or more precise
suffixes/prefixes, to avoid accidentally matching unrelated files.
The cleanup job is causing startup crashes, temporarily disabling it to get the system stable. Will debug and re-enable later.
📊 Test Coverage ReportCoverage: 59% (yellow) ✅ Coverage maintained or improved!
|
- Remove exit on code server health check timeout - Make nginx startup non-blocking if config test fails - Remove 'set -e' to allow script to continue despite individual service failures - This should fix the exit code 1 crashes we've been seeing The script will now continue startup even if some services have issues, rather than failing completely.
- Adjusted retention for skipped runs from 12 hours to 1 day (minimum allowed by Dagster) - Updated retention for successful runs from 6 hours to 1 day (minimum allowed by Dagster) These changes ensure compliance with Dagster's minimum retention requirements and improve artifact management.
📊 Test Coverage ReportCoverage: 59% (yellow) ✅ Coverage maintained or improved!
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (2)
scripts/deployment/start.sh (2)
30-42: Health-check now always succeeds → downstream services may start against a dead code-server.The function now returns
0unconditionally (Line 41), so the rest of the stack starts even whendagster api grpc-health-checknever became healthy.
Since both the web-server and daemon depend on a live code-server, this can mask boot failures and lead to an endless restart loop later.Two lightweight mitigations:
- echo "⚠️ Code server health check timed out, but continuing startup..." - return 0 + echo "❌ Code server health check timed out – giving up after $max_retries retries" + return 1 # bubble failureand keep the early
exit 1already present after the code-server start stanza to abort cleanly.
45-73:start_process_with_retryleaks log lines into the variable that is supposed to hold only the PID.Because every
echogoes to STDOUT, the caller captures all lines:CODE_SERVER_PID=$(start_process_with_retry …)
CODE_SERVER_PIDtherefore becomes something like:🔧 Starting Code Server (attempt 1/3)... Code Server PID: 1234 ✅ Code Server started successfully 1234Later checks (
kill -0 $CODE_SERVER_PID) will fail with “illegal pid” errors.Minimal, backwards-compatible fix: send diagnostics to STDERR and leave only the raw PID on STDOUT.
@@ - echo "$name PID: $pid" + echo "$name PID: $pid" >&2 @@ - echo "✅ $name started successfully" - echo $pid + echo "✅ $name started successfully" >&2 + printf '%s\n' "$pid" @@ - echo "⚠️ $name failed to start, retrying..." + echo "⚠️ $name failed to start, retrying..." >&2 @@ - echo "❌ Failed to start $name after $max_retries attempts" + echo "❌ Failed to start $name after $max_retries attempts" >&2No caller changes needed; they will now receive a clean PID string.
🧹 Nitpick comments (1)
scripts/deployment/start.sh (1)
108-115: Good defensive guard around nginx, but propagate the config error to monitoring.Nice addition of
nginx -tbefore launch. Consider also writing a sentinel file or emitting a structured log line when the config test fails so external health-checks (Fly checks, Prometheus, etc.) can alert operators that the reverse proxy is missing.Example:
echo "nginx_disabled" > /tmp/startup_warnings
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
dagster_fly.yaml(1 hunks)scripts/deployment/start.sh(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- dagster_fly.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: test
🔇 Additional comments (1)
scripts/deployment/start.sh (1)
4-5: Re-evaluate the blanket removal ofset -e.
set -eaborts the script on the first non-zero exit status, which is often desirable during container boot to surface fatal mis-configurations quickly.
Without it, latent errors (e.g., failedhtpasswd, badexport, missing binaries) will be silently ignored and the container may sit in a half-broken state that is harder to detect and debug.If the intention is merely to tolerate failure of specific subsystems, consider a surgical approach instead:
# keep global safety net set -euo pipefail # …later, for commands you really can ignore: some_maybe_failing_cmd || echo "non-fatal warning"This keeps startup robust while still letting non-critical steps fail gracefully.
- Add stronger cache busting in deploy_fly.sh with timestamp+random values - Make Dockerfile actually use CACHEBUST argument to invalidate Docker layers - Improve Makefile fresh deployment targets to clear builder cache - Add explicit dockerfile paths to avoid build confusion These changes ensure --force-rebuild actually forces a complete rebuild, preventing cached layers from causing deployment issues like we experienced with the Dagster config fix.
📊 Test Coverage ReportCoverage: 59% (yellow) ✅ Coverage maintained or improved!
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
docker/Dockerfile.fly (1)
6-8: Persisting the cache-bust marker adds an unnecessary file to the final imageWriting
/tmp/cachebustachieves cache invalidation, but the file remains in every image layer and marginally bloats image size.
Prefer a zero-footprint alternative, e.g. annotate the image with a label:-# Use CACHEBUST to invalidate cache when needed (this layer changes when CACHEBUST changes) -RUN echo "Cache bust: $CACHEBUST" > /tmp/cachebust +# Use CACHEBUST to invalidate cache when needed (layer changes via label) +LABEL cachebust="$CACHEBUST"Same cache-breaking effect, no leftover artifact.
scripts/deployment/deploy_fly.sh (2)
238-242: Strip the newline fromopenssl randto avoid malformedCACHEBUSTvalue
openssl rand -hex 4appends\n; the resultingCACHEBUST_VALUEtherefore contains an embedded newline, which is passed verbatim to--build-arg.
While Docker tolerates it, some shells or tooling may mis-parse the argument.-CACHEBUST_VALUE="$(date +%s)-$(openssl rand -hex 4 2>/dev/null || echo $RANDOM)" +CACHEBUST_VALUE="$(date +%s)-$(openssl rand -hex 4 2>/dev/null | tr -d '\n' || echo $RANDOM)"(Alternatively, wrap the whole substitution in
$( ... | tr -d '\n' ).)
253-255: Consider always passing aCACHEBUSTvalue, even on standard deploysIf the local Docker cache is stale (e.g. after a base-image security update) a normal cached build may silently skip the refresh. Passing a low-entropy, time-based
CACHEBUST(e.g. the date truncated to hours) during standard deploys mitigates this while still benefiting from layer caching.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
Makefile(2 hunks)docker/Dockerfile.fly(1 hunks)scripts/deployment/deploy_fly.sh(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- Makefile
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: test
🔇 Additional comments (1)
scripts/deployment/deploy_fly.sh (1)
244-251: Good defensive mix of cache-busting flagsCombining
--no-cache, an explicit Dockerfile path, and a build-arg gives deterministic force rebuilds. Looks solid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a comprehensive disk cleanup system for Fly.io instances to address disk space management concerns. The changes include both standalone cleanup capabilities and automated Dagster-based cleanup jobs (currently disabled) to optimize storage usage and prevent disk-full scenarios.
Key changes:
- New standalone disk cleanup script with normal, aggressive, and dry-run modes
- Makefile targets for convenient Fly.io disk cleanup operations
- Dagster cleanup jobs and schedules (temporarily disabled for testing)
- Enhanced deployment scripts with improved cache busting and error handling
- More aggressive data retention policies in Fly.io configuration
Reviewed Changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
scripts/maintenance/cleanup_disk_space.py |
New standalone script for comprehensive disk cleanup with multiple modes |
Makefile |
Added three new targets for Fly.io disk cleanup operations |
anomstack/jobs/cleanup.py |
New Dagster cleanup job with ops for artifacts, logs, and database cleanup |
scripts/deployment/start.sh |
Improved startup script resilience by removing strict error handling |
scripts/deployment/deploy_fly.sh |
Enhanced cache busting for Docker builds |
dagster_fly.yaml |
More aggressive retention policies for schedules and sensors |
tests/test_main.py |
Updated test expectations to reflect disabled cleanup functionality |
scripts/maintenance/README.md |
Documentation for the new cleanup script |
Makefile.md |
Documentation for new Makefile cleanup targets |
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| main() No newline at end of file |
Copilot
AI
Jul 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove trailing whitespace at the end of the file.
| main() | |
| main() |
|
|
||
| # Export for main.py | ||
| cleanup_jobs = [cleanup_job] | ||
| cleanup_schedules = [cleanup_schedule] No newline at end of file |
Copilot
AI
Jul 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove trailing whitespace at the end of the file.
| cleanup_schedules = [cleanup_schedule] | |
| cleanup_schedules = [cleanup_schedule] |
| if file.endswith(('.log', '.out', '.err')) or 'dagster' in file.lower(): | ||
| file_path = os.path.join(root, file) | ||
| try: | ||
| file_time = datetime.fromtimestamp(os.path.getmtime(file_path)) |
Copilot
AI
Jul 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The os.path.getmtime() call is performed twice for the same file - once here and potentially again at line 126. Consider storing the result in a variable to avoid duplicate system calls.
| file_time = datetime.fromtimestamp(os.path.getmtime(file_path)) | |
| file_mtime = os.path.getmtime(file_path) | |
| file_time = datetime.fromtimestamp(file_mtime) |
| # Remove log files older than 24 hours | ||
| if os.path.getmtime(file_path) < (datetime.now() - timedelta(hours=24)).timestamp(): |
Copilot
AI
Jul 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The datetime.now() - timedelta(hours=24) calculation is performed for every file in the loop. Consider calculating this cutoff timestamp once before the loop for better performance.
| # Remove log files older than 24 hours | |
| if os.path.getmtime(file_path) < (datetime.now() - timedelta(hours=24)).timestamp(): | |
| if os.path.getmtime(file_path) < cutoff_time: |
|
|
||
| # Anomstack Startup Script for Fly.io with improved gRPC connectivity | ||
| set -e | ||
| # Removed 'set -e' to allow script to continue even if some services fail |
Copilot
AI
Jul 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing set -e changes the script's error handling behavior significantly. Consider using more targeted error handling (like || true for specific commands) rather than disabling strict error checking globally.
This pull request introduces a comprehensive disk cleanup system for Fly.io instances, including new Makefile targets, a standalone Python script, and a Dagster job for automated cleanup. These changes aim to optimize disk space usage and improve maintenance workflows. The most important changes are grouped below:
New Disk Cleanup Functionality
fly-cleanup,fly-cleanup-preview, andfly-cleanup-aggressive, each providing varying levels of disk cleanup for Fly.io instances. (Makefile, MakefileR419-R438)cleanup_disk_space.py, for managing disk space, supporting dry-run and aggressive cleanup modes. (scripts/maintenance/cleanup_disk_space.py, scripts/maintenance/cleanup_disk_space.pyR1-R262)Makefile.md, [1];scripts/maintenance/README.md, [2]Automated Disk Cleanup in Dagster
cleanup_job, with operations for cleaning old artifacts, logs, and metrics, as well as reporting disk usage before and after cleanup. The job is scheduled to run every 2 hours. (anomstack/jobs/cleanup.py, anomstack/jobs/cleanup.pyR1-R187)anomstack/main.py. (anomstack/main.py, [1] [2] [3]Fly.io Optimization
dagster_fly.yamlfor more aggressive purging of schedules and sensor data, reducing retention periods to as short as 6 hours. (dagster_fly.yaml, dagster_fly.yamlL35-R40)Testing Updates
tests/test_main.pyto account for the addition of the cleanup job and schedule. (tests/test_main.py, tests/test_main.pyL12-R24)Summary by CodeRabbit
New Features
Documentation
Bug Fixes
Chores