Skip to content

Backup restore issue #1136

@ludook

Description

@ludook

Describe the bug
The operator seems to cordon the emqx cluster after a backup restore

To Reproduce
Steps to reproduce the behavior:

  1. create a 2 nodes emqx cluster with pvc
  2. check that emqx dashboard is up and running
  3. launch a backup on the first pod with a pod exec : emqx ctl data export
  4. copy backup file on a laptop with kubectl cp
  5. drop the whole cluster
  6. create a new 2 nodes cluster
  7. check dashboard is up and running
  8. copy the previous backup on the first cluster pod
  9. run the import with into the pod, with a pod exec : emqx ctl data import /tmp/emqx-export.tar.gz
  10. dashboard shows previous imported values for a few seconds
  11. then here is the issue after a few seconds :
  • dashboard does not respond anymore
  • emqx cluster statefulset is marked as not ready 0/2
  • all mqtt clients are disconnected
  • logs in the 2 pods show no error
  • logs in emqx operator shows some error about wrong BAD_API_KEY_OR_SECRET --> seems to cordon the emqx cluster

If I repeat all this process but with operator deployment scaled to 0, the issue is not present.
Same issue when I restore all the pvc on a fresh installed cluster (with longhorn backups restore)

Expected behavior
can restore data in a emqx kubernetes fresh cluster

Environment details::

  • Kubernetes version: 1.33
  • Cloud-provider/provisioner: on premise cluster
  • emqx-operator version: 2.2.29
  • Install method: all is installed with argocd, with no issue

if you need some other details, please ask me.
thank you for your great job on emqx :-)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions