Skip to content

Conversation

@abuccts
Copy link
Member

@abuccts abuccts commented Oct 26, 2025

Fix service build and deployment on arm64 architecture.

Fix service build and deployment on arm64 architecture.
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enables deployment on arm64 architecture by updating Docker configurations and Kubernetes deployment templates to support multi-architecture builds and private container registries.

Key changes:

  • Adds imagePullSecrets configuration to deployment templates for private registry authentication
  • Creates new Dockerfiles for device plugins and kube-scheduler to enable custom image builds
  • Updates deployment scripts to use configurable registry prefixes and image tags

Reviewed Changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/hivedscheduler/deploy/hivedscheduler.yaml.template Adds imagePullSecrets for private registry access
src/hivedscheduler/build/kube-scheduler.k8s.dockerfile New Dockerfile for kube-scheduler image
src/hivedscheduler/build/hivedscheduler.k8s.dockerfile New Dockerfile replacing the common dockerfile with updated license header
src/hivedscheduler/build/hivedscheduler.common.dockerfile Removed in favor of k8s-specific dockerfile
src/frameworkcontroller/deploy/frameworkcontroller.yaml.template Adds imagePullSecrets for private registry access
src/device-plugin/deploy/start.sh.template Updates to use configurable registry and adds imagePullSecrets injection
src/device-plugin/deploy/service.yaml Adds device-plugin.yaml to template list
src/device-plugin/deploy/device-plugin.yaml.template Replaces hardcoded image with configurable registry and adds imagePullSecrets
src/device-plugin/build/k8s-rocm-device-plugin.k8s.dockerfile New Dockerfile for AMD ROCm device plugin
src/device-plugin/build/k8s-rdma-shared-dev-plugin.k8s.dockerfile New Dockerfile for RDMA device plugin
src/device-plugin/build/k8s-nvidia-device-plugin.k8s.dockerfile New Dockerfile for NVIDIA device plugin
src/device-plugin/build/k8s-host-device-plugin.k8s.dockerfile New Dockerfile for host device plugin
contrib/kubespray/script/environment.sh Adds force flag to symlink commands to prevent errors on re-runs

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.

FROM ghcr.io/mellanox/k8s-rdma-shared-dev-plugin:v1.5.3
Copy link

Copilot AI Oct 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deployment script references v1.4.0 in start.sh.template (line 86), but this Dockerfile uses v1.5.3. This version mismatch could cause deployment issues. Consider aligning both to use the same version.

Suggested change
FROM ghcr.io/mellanox/k8s-rdma-shared-dev-plugin:v1.5.3
FROM ghcr.io/mellanox/k8s-rdma-shared-dev-plugin:v1.4.0

Copilot uses AI. Check for mistakes.
Copy link
Member Author

@abuccts abuccts Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the v1.4.0 daemon set yaml uses latest tag, @hippogr to confirm current version

Update.
Use buildkit for Docker build.
Upgrade NVIDIA k8s device plugin version.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants