Skip to content

Conversation

@dhvll
Copy link
Contributor

@dhvll dhvll commented Aug 30, 2025

This PR adds a new Common Reliability Enumeration (CRE) rule to detect critical meta tensor corruption failures in Stable Diffusion Web UI applications. The rule identifies the specific error pattern NotImplementedError: Cannot copy out of meta tensor; no data! which causes complete service failure and prevents any image generation.

Root Cause

Meta tensor corruption occurs when PyTorch tensors lose their actual data while retaining only metadata (shape, dtype). This typically happens due to:

  • Corrupted or incomplete model checkpoint files (safetensors/ckpt)
  • PyTorch tensor corruption during model loading
  • Device mismatch between CPU and GPU tensors
  • Memory corruption during tensor operations

Error Pattern

NotImplementedError: Cannot copy out of meta tensor; no data!

Mitigation Strategies

Immediate Actions

  1. Restart Stable Diffusion Web UI service to clear corrupted tensor states
  2. Re-download and verify model checkpoint files
  3. Check GPU memory and clear any corrupted tensor allocations

Preventive Measures

  1. Implement model file integrity checks
  2. Add tensor state validation before CUDA operations
  3. Monitor GPU memory usage and tensor allocations

X Post Link

/fix #130
/claim #130

stable-diffusion.webm

Github repo Repo

preq playground

@dhvll
Copy link
Contributor Author

dhvll commented Oct 29, 2025

@Lyndon-prequel can i get a review on this ?

@Lyndon-prequel
Copy link
Contributor

@Lyndon-prequel can i get a review on this ?

@amanycodes

@amanycodes
Copy link
Contributor

Hi @dhvll thanks for the submitting the CRE! Please add relevant tags to the rules/tags/tags.yaml to make the CI test pass. Also could you explain why you deleted multiple tags from the tags.yaml file?

@dhvll
Copy link
Contributor Author

dhvll commented Oct 30, 2025

Because I was getting a merge error so.
Should I just get the copy of the latest tags.yaml file is it okay ?

@amanycodes
Copy link
Contributor

mostly LGTM, please also add the applications field in the CRE yaml file, e.g:

applications:
        - name: "rabbitmq"
          version: "3.9.x"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Stable Diffusion Web UI: Reproduce A High-Severity Failure & Write a CRE Rule [Multiple Winners] [Submit by August 31 11:59 pm ET]

3 participants