Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions apis/apps/v1alpha1/daemonset_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -163,6 +163,18 @@ type DaemonSetSpec struct {
// Currently, we only support pre-delete hook for Advanced DaemonSet.
// +optional
Lifecycle *appspub.Lifecycle `json:"lifecycle,omitempty"`

// volumeClaimTemplates is a list of claims that pods are allowed to reference.
// The DaemonSet controller is responsible for mapping network identities to
// claims in a way that maintains the identity of a pod. Every claim in
// this list must have at least one matching (by name) volumeMount in one
// container in the template. A claim in this list takes precedence over
// any volumes in the template, with the same name.
// TODO: Define the behavior if a claim already exists with the same name.
// +optional
// +kubebuilder:pruning:PreserveUnknownFields
// +kubebuilder:validation:Schemaless
VolumeClaimTemplates []corev1.PersistentVolumeClaim `json:"volumeClaimTemplates,omitempty"`
}

// DaemonSetStatus defines the observed state of DaemonSet
Expand Down
7 changes: 7 additions & 0 deletions apis/apps/v1alpha1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 9 additions & 0 deletions config/crd/bases/apps.kruise.io_daemonsets.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -352,6 +352,15 @@ spec:
or "OnDelete". Default is RollingUpdate.
type: string
type: object
volumeClaimTemplates:
description: |-
volumeClaimTemplates is a list of claims that pods are allowed to reference.
The DaemonSet controller is responsible for mapping network identities to
claims in a way that maintains the identity of a pod. Every claim in
this list must have at least one matching (by name) volumeMount in one
container in the template. A claim in this list takes precedence over
any volumes in the template, with the same name.
x-kubernetes-preserve-unknown-fields: true
required:
- selector
- template
Expand Down
118 changes: 118 additions & 0 deletions docs/proposals/20250722-ads-volumeclaimtemplate.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
---
title: volumeClaimTemplate for Advanced DaemonSet
authors:

- "@chengjoey"

reviewers:

- "@ChristianCiach"

- "@furykerry"

- "@ABNER-1"

creation-date: 2025-07-22
last-updated: 2025-07-22
status: implementable
---

# volumeClaimTemplate for Advanced DaemonSet
Add volumeClaimTemplate to Advanced DaemonSet

## Table of Contents

- [volumeClaimTemplate for Advanced DaemonSet](#volumeClaimTemplate-for-Advanced-DaemonSet)
- [Table of Contents](#table-of-contents)
- [Motivation](#motivation)
- [Proposal](#proposal)
- [User Stories](#user-stories)
- [Story 1](#story-1)
- [Implementation Details/Notes/Constraints](#implementation-detailsnotesconstraints)
- [Implementation History](#implementation-history)

## Motivation

Now, Most of the daemon set specs I have seen so far use hostpath volumes for storage. I would like to use local persistent volumes (= LPV)
instead of hostpath volumes. The problem is that you need a PVC to get a LPV.
Daemon sets do not allow you to dynamically create a PVC that claims a LPV by using a dedicated storage class like
stateful sets do (using volumeClaimTemplates).

We hope to add volumeClaimTemplate to Advanced DaemonSet like stateful sets do.

## Proposal

add volumeClaimTemplate to Advanced DaemonSet like stateful sets do.

### API Definition

```
// DaemonSetSpec defines the desired state of DaemonSet
type DaemonSetSpec struct {
// volumeClaimTemplates is a list of claims that pods are allowed to reference.
// The DaemonSet controller is responsible for mapping network identities to
// claims in a way that maintains the identity of a pod. Every claim in
// this list must have at least one matching (by name) volumeMount in one
// container in the template. A claim in this list takes precedence over
// any volumes in the template, with the same name.
// TODO: Define the behavior if a claim already exists with the same name.
// +optional
// +kubebuilder:pruning:PreserveUnknownFields
// +kubebuilder:validation:Schemaless
VolumeClaimTemplates []corev1.PersistentVolumeClaim `json:"volumeClaimTemplates,omitempty"`
}
```

```
apiVersion: apps.kruise.io/v1alpha1
kind: DaemonSet
metadata:
name: app-ds
spec:
selector:
matchLabels:
app: app-ds
template:
metadata:
labels:
app: app-ds
spec:
containers:
- name: nginx
image: nginx:latest
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /var/lib/nginx
name: nginx-data
volumeClaimTemplates:
- apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nginx-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
volumeMode: Filesystem
```

### User Stories

#### Story 1

there are many people are looking for a DaemonSet-like Workload type that supports defining a PVC per Pod.

[Feature Request : volumeClaimTemplates available for Daemon Sets](https://github.com/kubernetes/kubernetes/issues/78902)

[feature request: Add volumeClaimTemplate to advanced DaemonSet](https://github.com/openkruise/kruise/issues/2112)

### Implementation Details/Notes/Constraints

We should consider whether to delete the PVC when a node is deleted, and whether to delete the PVC when the ads is deleted.
We can add a `PersistentVolumeClaimRetentionPolicy`, with optional values of `Retain` and `Delete`, the default being `Retain`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the default policy should be delete, since there is no easy way to reuse the pvc if the node is deleted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in addition, we have to introduce a label to match pod and related pvc. In cloneset, pod and pvc with the same apps.kruise.io/cloneset-instance-id will be mounted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have a discussion in the community call this week ?


## Implementation History

- [ ] 07/22/2025: Proposal submission, implement VolumeClaimTemplates create
17 changes: 14 additions & 3 deletions pkg/controller/daemonset/daemonset_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ import (
appslisters "k8s.io/client-go/listers/apps/v1"
corelisters "k8s.io/client-go/listers/core/v1"
"k8s.io/client-go/tools/cache"
toolscache "k8s.io/client-go/tools/cache"
"k8s.io/client-go/tools/record"
"k8s.io/client-go/util/flowcontrol"
"k8s.io/client-go/util/retry"
Expand Down Expand Up @@ -171,21 +172,30 @@ func newReconciler(mgr manager.Manager) (reconcile.Reconciler, error) {
if err != nil {
return nil, err
}
pvcInformer, err := cacher.GetInformerForKind(context.TODO(), corev1.SchemeGroupVersion.WithKind("PersistentVolumeClaim"))
if err != nil {
return nil, err
}

dsLister := kruiseappslisters.NewDaemonSetLister(dsInformer.(cache.SharedIndexInformer).GetIndexer())
historyLister := appslisters.NewControllerRevisionLister(revInformer.(cache.SharedIndexInformer).GetIndexer())
podLister := corelisters.NewPodLister(podInformer.(cache.SharedIndexInformer).GetIndexer())
nodeLister := corelisters.NewNodeLister(nodeInformer.(cache.SharedIndexInformer).GetIndexer())
failedPodsBackoff := flowcontrol.NewBackOff(1*time.Second, 15*time.Minute)
revisionAdapter := revisionadapter.NewDefaultImpl()
pvcLister := corelisters.NewPersistentVolumeClaimLister(pvcInformer.(toolscache.SharedIndexInformer).GetIndexer())

cli := utilclient.NewClientFromManager(mgr, "daemonset-controller")
dsc := &ReconcileDaemonSet{
Client: cli,
kubeClient: genericClient.KubeClient,
kruiseClient: genericClient.KruiseClient,
eventRecorder: recorder,
podControl: kubecontroller.RealPodControl{KubeClient: genericClient.KubeClient, Recorder: recorder},
podControl: &dsPodControl{
recorder: recorder,
objectMgr: kruiseutil.NewRealObjectManager(genericClient.KubeClient, podLister, pvcLister, nil),
PodControlInterface: kubecontroller.RealPodControl{KubeClient: genericClient.KubeClient, Recorder: recorder},
},
crControl: kubecontroller.RealControllerRevisionControl{
KubeClient: genericClient.KubeClient,
},
Expand Down Expand Up @@ -270,7 +280,7 @@ type ReconcileDaemonSet struct {
kubeClient clientset.Interface
kruiseClient kruiseclientset.Interface
eventRecorder record.EventRecorder
podControl kubecontroller.PodControlInterface
podControl *dsPodControl
crControl kubecontroller.ControllerRevisionControlInterface
lifecycleControl lifecycle.Interface

Expand Down Expand Up @@ -755,7 +765,7 @@ func (dsc *ReconcileDaemonSet) syncNodes(ctx context.Context, ds *appsv1alpha1.D
podTemplate.Spec.NodeName = nodesNeedingDaemonPods[ix]
}

err = dsc.podControl.CreatePods(ctx, ds.Namespace, podTemplate, ds, metav1.NewControllerRef(ds, controllerKind))
err = dsc.podControl.CreatePod(ctx, ds.Namespace, podTemplate, ds, metav1.NewControllerRef(ds, controllerKind), nodesNeedingDaemonPods[ix])

if err != nil {
if errors.HasStatusCause(err, corev1.NamespaceTerminatingCause) {
Expand Down Expand Up @@ -790,6 +800,7 @@ func (dsc *ReconcileDaemonSet) syncNodes(ctx context.Context, ds *appsv1alpha1.D
for i := 0; i < deleteDiff; i++ {
go func(ix int) {
defer deleteWait.Done()
// TODO: delete pvc when persistentVolumeClaimRetentionPolicy is set to Delete
if err := dsc.podControl.DeletePod(ctx, ds.Namespace, podsToDelete[ix], ds); err != nil {
dsc.expectations.DeletionObserved(logger, dsKey)
if !errors.IsNotFound(err) {
Expand Down
7 changes: 5 additions & 2 deletions pkg/controller/daemonset/daemonset_controller_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -219,7 +219,10 @@ func newTestController(initialObjects ...runtime.Object) (*daemonSetsController,
fakeRecorder := record.NewFakeRecorder(100)
dsc.eventRecorder = fakeRecorder
podControl := newFakePodControl()
dsc.podControl = podControl
dsControl := &dsPodControl{
PodControlInterface: podControl,
}
dsc.podControl = dsControl
podControl.podStore = informerFactory.Core().V1().Pods().Informer().GetStore()

newDsc := &daemonSetsController{
Expand Down Expand Up @@ -253,7 +256,7 @@ func NewDaemonSetController(
kubeClient: kubeClient,
kruiseClient: kruiseClient,
eventRecorder: recorder,
podControl: controller.RealPodControl{KubeClient: kubeClient, Recorder: recorder},
podControl: &dsPodControl{PodControlInterface: controller.RealPodControl{KubeClient: kubeClient, Recorder: recorder}},
crControl: controller.RealControllerRevisionControl{
KubeClient: kubeClient,
},
Expand Down
112 changes: 112 additions & 0 deletions pkg/controller/daemonset/daemonset_util.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,10 @@ limitations under the License.
package daemonset

import (
"context"
"fmt"
"sort"
"strings"
"sync"
"time"

Expand All @@ -31,11 +33,15 @@ import (
apps "k8s.io/api/apps/v1"
corev1 "k8s.io/api/core/v1"
apiequality "k8s.io/apimachinery/pkg/api/equality"
apierrors "k8s.io/apimachinery/pkg/api/errors"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/labels"
errorutils "k8s.io/apimachinery/pkg/util/errors"
intstrutil "k8s.io/apimachinery/pkg/util/intstr"
"k8s.io/client-go/tools/record"
v1helper "k8s.io/component-helpers/scheduling/corev1"
podutil "k8s.io/kubernetes/pkg/api/v1/pod"
kubecontroller "k8s.io/kubernetes/pkg/controller"
"k8s.io/kubernetes/pkg/controller/daemon/util"
"k8s.io/utils/integer"
)
Expand All @@ -46,6 +52,12 @@ var (
newPodForDSLock sync.Mutex
)

type dsPodControl struct {
recorder record.EventRecorder
objectMgr kruiseutil.ObjectManager
kubecontroller.PodControlInterface
}

type newPodForDS struct {
generation int64
pod *corev1.Pod
Expand Down Expand Up @@ -371,3 +383,103 @@ func podAvailableWaitingTime(pod *corev1.Pod, minReadySeconds int32, now time.Ti
}
return minReadySecondsDuration - now.Sub(c.LastTransitionTime.Time)
}

func getPersistentVolumeClaimName(ds *appsv1alpha1.DaemonSet, claim *corev1.PersistentVolumeClaim, nodeName string) string {
return fmt.Sprintf("%s-%s-%s", claim.Name, ds.Name, nodeName)
}

func getPersistentVolumeClaims(ds *appsv1alpha1.DaemonSet, nodeName string) map[string]corev1.PersistentVolumeClaim {
templates := ds.Spec.VolumeClaimTemplates
claims := make(map[string]corev1.PersistentVolumeClaim, len(templates))
for i := range templates {
claim := templates[i].DeepCopy()
claim.Name = getPersistentVolumeClaimName(ds, claim, nodeName)
claim.Namespace = ds.Namespace
if claim.Labels != nil {
for key, value := range ds.Spec.Selector.MatchLabels {
claim.Labels[key] = value
}
} else {
claim.Labels = ds.Spec.Selector.MatchLabels
}
claims[templates[i].Name] = *claim
}
return claims
}

func updateStorage(ds *appsv1alpha1.DaemonSet, template *corev1.PodTemplateSpec, nodeName string) {
currentVolumes := template.Spec.Volumes
claims := getPersistentVolumeClaims(ds, nodeName)
newVolumes := make([]corev1.Volume, 0, len(claims))
for name, claim := range claims {
newVolumes = append(newVolumes, corev1.Volume{
Name: name,
VolumeSource: corev1.VolumeSource{
PersistentVolumeClaim: &corev1.PersistentVolumeClaimVolumeSource{
ClaimName: claim.Name,
ReadOnly: false,
},
},
})
}
for i := range currentVolumes {
if _, ok := claims[currentVolumes[i].Name]; !ok {
newVolumes = append(newVolumes, currentVolumes[i])
}
}
template.Spec.Volumes = newVolumes
}

// recordClaimEvent records an event for verb applied to the PersistentVolumeClaim of a Pod in a Daemonset. If err is
// nil the generated event will have a reason of v1.EventTypeNormal. If err is not nil the generated event will have a
// reason of v1.EventTypeWarning.
func (dsc *dsPodControl) recordClaimEvent(verb string, ds *appsv1alpha1.DaemonSet, nodeName string, claim *corev1.PersistentVolumeClaim, err error) {
if err == nil {
reason := fmt.Sprintf("Successful%s", strings.Title(verb))
message := fmt.Sprintf("%s Daemonset %s Claim %s in Node %s success",
strings.ToLower(verb), ds.Name, claim.Name, nodeName)
dsc.recorder.Event(ds, corev1.EventTypeNormal, reason, message)
} else {
reason := fmt.Sprintf("Failed%s", strings.Title(verb))
message := fmt.Sprintf("%s Claim %s for Daemonset %s in Node %s failed error: %s",
strings.ToLower(verb), claim.Name, ds.Name, nodeName, err)
dsc.recorder.Event(ds, corev1.EventTypeWarning, reason, message)
}
}

func (dsc *dsPodControl) createPersistentVolumeClaims(ds *appsv1alpha1.DaemonSet, nodeName string) error {
var errs []error
for _, claim := range getPersistentVolumeClaims(ds, nodeName) {
pvc, err := dsc.objectMgr.GetClaim(claim.Namespace, claim.Name)
switch {
case apierrors.IsNotFound(err):
err := dsc.objectMgr.CreateClaim(&claim)
if err != nil {
errs = append(errs, fmt.Errorf("failed to create PVC %s: %s", claim.Name, err))
}
if err == nil || !apierrors.IsAlreadyExists(err) {
dsc.recordClaimEvent("Create", ds, nodeName, &claim, err)
}
case err != nil:
errs = append(errs, fmt.Errorf("failed to retrieve PVC %s: %s", claim.Name, err))
dsc.recordClaimEvent("Create", ds, nodeName, &claim, err)
default:
if pvc.DeletionTimestamp != nil {
errs = append(errs, fmt.Errorf("pvc %s is to be deleted", claim.Name))
}
}
}
return errorutils.NewAggregate(errs)
}

// CreatePod creates a pod from a template. and create pvc if needed.
func (dsc *dsPodControl) CreatePod(ctx context.Context, namespace string, template *corev1.PodTemplateSpec,
ds *appsv1alpha1.DaemonSet, controllerRef *metav1.OwnerReference, nodeName string) error {
tmpl := template.DeepCopy()
if err := dsc.createPersistentVolumeClaims(ds, nodeName); err != nil {
return err
}

updateStorage(ds, tmpl, nodeName)
return dsc.CreatePods(ctx, namespace, tmpl, ds, controllerRef)
}
Loading
Loading