-
Notifications
You must be signed in to change notification settings - Fork 51
Stable Diffusion WebUI Critical CUDA Out of Memory Detection Rule #134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Stable Diffusion WebUI Critical CUDA Out of Memory Detection Rule #134
Conversation
| rules: | ||
| - metadata: | ||
| kind: prequel | ||
| id: StableDiffusionCUDAOOMDetector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use ./bin/ruler id to generate a valid id eg;
❯ ./bin/ruler id
D3ZNiWma64wUnDYq6NSqYj
| version: "*" | ||
| - name: pytorch | ||
| version: "*" | ||
| impactScore: 9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the only support list items inside applications are name and version, we can remove the rest.
| - name: recursive-analysis | ||
| displayName: Recursive Analysis | ||
| description: Problems where systems enter recursive self-analysis loops leading to resource exhaustion | ||
| description: Problems where systems enter recursive self-analysis loops leading to resource exhaustio |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's a typo here
🎯 Overview
This PR introduces a comprehensive detection rule for Stable Diffusion WebUI CUDA Out of Memory failures - addressing one of the most critical and widespread issues affecting AUTOMATIC1111 Stable Diffusion deployments globally. The rule identifies CUDA memory exhaustion leading to complete WebUI service failure requiring manual intervention.
CRE Playground Links
CRE-2025-0130 Playground: Test Rule
🚨 Problem Statement
High-Severity Issue: Stable Diffusion WebUI CUDA failures cause:
Why This Matters: Stable Diffusion CUDA failures are particularly dangerous because:
Rule Performance
📊 Stable Diffusion Issues Covered
torch.cuda.OutOfMemoryError: CUDA out of memory🧪 Testing & Validation
CRE Rule Testing
Test Results:

🎬 Demo Environment
Repo link (private invitation already send) https://github.com/MAVRICK-1/cuda-oom
Screencast.from.2025-08-27.13-19-35.mp4
./start-demo.sh cat logs/roop-cuda-oom.log | preq -r stable-diffusion-cuda-oom.yaml -dFixes #130
/claim #130