Skip to content

Conversation

@lbagic
Copy link

@lbagic lbagic commented Oct 27, 2025

Summary by CodeRabbit

Release Notes

  • New Features
    • Added support for Kubeflow Notebook objects
    • Implemented greater-than and less-than comparison operators for filtering
    • Introduced time-range query capabilities for historical data
    • Added event deletion timestamp tracking
    • Expanded resource search and workload discovery functionality
    • Added node policy management capabilities (create, list, update)
    • Enhanced utilization metrics reporting by instance type, reservation type, and availability zone
    • Introduced network throughput configuration for scaling optimization

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 27, 2025

Walkthrough

This PR extends three Protocol Buffer definition files with substantial new API capabilities: adding K8s object kinds and label selector operators to common definitions; introducing event datapoint structures; adding node group filtering, time-range queries, and utilization metrics segmentation; introducing K8s resource search and graph traversal APIs; and expanding the recommendation service with node policy management and instance configuration features.

Changes

Cohort / File(s) Summary
Common enum and message extensions
internal/proto/api/v1/common.proto
Added K8S_OBJECT_KIND_KUBEFLOW_NOTEBOOK enum value; expanded LabelSelectorOperator with GT and LT operators; extended Event message with deleted_at timestamp field; introduced EventDatapointInfo and EventDatapoint messages for structured event datapoint representation.
K8s node group and utilization enhancements
internal/proto/api/v1/k8s.proto (node group & utilization sections)
Added show_deleted, start_time, and end_time fields to node group requests; expanded GetNodeGroupsUtilizationResponse with instance_type_to_util_metrics, reservation_type_to_util_metrics, and logical_az_to_util_metrics maps.
K8s resource search and graph APIs
internal/proto/api/v1/k8s.proto (new search & graph sections)
Introduced SearchK8sResources, SearchK8sWorkloads, GetClusterType, and GetRelationsForKind API surfaces; added corresponding request/response messages and graph-related data structures (K8sSearchResult, K8sWorkloadSearchResult, ClusterType enum, K8sRelatedResource, K8sResourceNode, K8sResourceEdge); added four new RPCs to K8SService.
Recommendation service node policy and search features
internal/proto/api/v1/recommendation.proto
Added 11 new RPCs for node policy CRUD, targeting filters, instance discovery, and recommendation preview; introduced NodePolicy, NodePolicyTarget, and related request/response messages; added node_recommendation field to BalanceGetRecommendationsResponse; extended VerticalScalingOptimizationTarget and HorizontalScalingOptimizationTarget with new fields; introduced LiveMigDeletionPropagation enum; refactored HPAMetricType network metrics; added numerous provider-specific node class specs.

Sequence Diagram

sequenceDiagram
    participant Client
    participant K8sService
    participant ResourceDB
    participant RelationGraph

    rect rgb(200, 230, 255)
    note right of Client: K8s Resource Search
    Client->>K8sService: SearchK8sResources(filters)
    K8sService->>ResourceDB: query resources by type/label
    ResourceDB-->>K8sService: K8sSearchResult[]
    K8sService-->>Client: SearchK8sResourcesResponse
    end

    rect rgb(200, 230, 255)
    note right of Client: K8s Relationship Traversal
    Client->>K8sService: GetRelationsForKind(resource_kind)
    K8sService->>RelationGraph: retrieve related resources
    RelationGraph-->>K8sService: K8sRelatedResource[]
    K8sService-->>Client: GetRelatedResourcesResponse
    end

    rect rgb(230, 220, 255)
    note right of Client: Node Policy Management
    Client->>K8sService: CreateNodePolicies(policies)
    K8sService->>ResourceDB: persist policies with targets
    ResourceDB-->>K8sService: policy_ids[]
    K8sService-->>Client: CreateNodePoliciesResponse
    end

    rect rgb(230, 220, 255)
    note right of Client: Recommendation Preview
    Client->>K8sService: PreviewNodeRecommendationConfig(config)
    K8sService->>ResourceDB: simulate recommendation
    ResourceDB-->>K8sService: NodeRecommendation
    K8sService-->>Client: PreviewNodeRecommendationConfigResponse
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

  • Critical areas requiring attention:
    • New K8s resource search and graph APIs require careful validation of query semantics and relationship graph structure consistency
    • Node policy management RPCs and associated data model (NodePolicy, NodePolicyTarget, TargetFilters) involve new state management patterns that need review for correctness and API contract clarity
    • HPAMetricType enum change (replacing HPA_METRIC_TYPE_NETWORK with NETWORK_INGRESS and NETWORK_EGRESS) may impact downstream clients; backward compatibility implications should be verified
    • Multiple provider-specific node class specs (AWSNodeClassSpec, GCPNodeClassSpec, etc.) need validation for completeness and consistent field mapping across cloud providers
    • Field number assignments across multiple new messages and RPCs should be reviewed for proto best practices and collision avoidance

Possibly related PRs

  • devzero-inc/terraform-provider-devzero#1: Directly adds incremental extensions to the same three proto files (common.proto, k8s.proto, recommendation.proto) with related enum values and new fields (K8S_OBJECT_KIND_KUBEFLOW_NOTEBOOK, LabelSelectorOperator GT/LT, Event.deleted_at, EventDatapoint messages, and node group filtering/targeting enhancements).

Poem

🐰 Hops through protocol gardens three,
New enums bloom and messages flow free,
Node policies dance, resources align,
Search graphs and recommendations intertwine,
Our API expands with joy and delight!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title Check ❓ Inconclusive The pull request title "feat: update proto files" is vague and generic. While the title is technically related to the changeset (proto files are indeed being updated), it uses a non-descriptive term that fails to convey meaningful information about what was actually changed. The PR contains substantial additions across three proto files including new K8s resource search and discovery APIs, node policy management capabilities, extended utilization metrics, and multiple new message types and services. A title like "feat: update proto files" could apply to virtually any proto file modification and provides no insight into the primary changes or their significance. Consider revising the title to be more specific and descriptive, such as "feat: add K8s resource search and node policy management APIs" or "feat: expand recommendation and K8s service APIs with search and node policies." This would clearly communicate the main changes and help reviewers and future developers understand the pull request's purpose at a glance.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch update-proto

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
internal/proto/api/v1/common.proto (1)

1451-1459: Clarify EventDatapointInfo field documentation.

Line 1454 states "Event owner name" for the owner_uid field, but the field is named owner_uid and should reference a UID, not a name. The comment appears inconsistent with the field semantics.

Additionally, line 1456 references "Event owner name" for the owner_name field, which is correct, but line 1454 should be updated to reflect the UID semantic.

Consider updating the comment on line 1454:

- string owner_uid = 3; // Event owner name
+ string owner_uid = 3; // Event owner UID
internal/proto/api/v1/recommendation.proto (1)

1687-1745: Document NodePolicy provider-specific fields and override strategy.

The NodePolicy message (lines 1687-1745) introduces comprehensive Karpenter support with provider-specific overrides (AWS, GCP, Azure, OCI, lines 1734-1737). However, the interaction and override precedence between:

  • Cross-cloud core fields (instance_categories, instance_families, instance_cpus, etc.)
  • Provider-specific specs (aws, gcp, azure, oci)
  • Raw escape hatch (raw RawKarpenterSpec)

...is not documented. This could lead to confusion about which settings take precedence when multiple are specified.

Add inline documentation to clarify the precedence and combination strategy:

message NodePolicy {
  // ... core fields ...
  
  // Provider-specific overrides (applied if set; override core fields)
  AWSNodeClassSpec aws = 81;    // AWS-specific overrides (if set, takes precedence over core fields)
  GCPNodeClassSpec gcp = 82;
  // ...
  
  // Raw escape hatch for advanced configurations (lowest precedence; applied after provider specs)
  repeated RawKarpenterSpec raw = 100;
}

Additionally, document the intended workflow: do users set core fields and then override per-provider, or start with provider-specific and fall back to core?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ddb61ef and 8d1824e.

⛔ Files ignored due to path filters (5)
  • internal/gen/api/v1/apiv1connect/k8s.connect.go is excluded by !**/gen/**
  • internal/gen/api/v1/apiv1connect/recommendation.connect.go is excluded by !**/gen/**
  • internal/gen/api/v1/common.pb.go is excluded by !**/*.pb.go, !**/gen/**
  • internal/gen/api/v1/k8s.pb.go is excluded by !**/*.pb.go, !**/gen/**
  • internal/gen/api/v1/recommendation.pb.go is excluded by !**/*.pb.go, !**/gen/**
📒 Files selected for processing (3)
  • internal/proto/api/v1/common.proto (3 hunks)
  • internal/proto/api/v1/k8s.proto (7 hunks)
  • internal/proto/api/v1/recommendation.proto (9 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: generate
🔇 Additional comments (6)
internal/proto/api/v1/k8s.proto (2)

753-782: Clarify format and semantics of generic string fields in metadata messages.

Lines 753-782 add generic string fields (label, annotation, spec, status, metadata) to WorkloadMetadata, PodMetadata, and NodeMetadata. These fields lack documentation about their expected format (JSON, YAML, plain text) and content structure, which creates ambiguity for clients and servers consuming this API.

This is especially problematic since fields like spec and status are core Kubernetes concepts that should have clear serialization semantics.

Please clarify the intended format and document the expected content for these fields. Consider:

  1. Whether they should be JSON-serialized structures (if so, document the schema)
  2. Whether they should remain as opaque strings or use structured types (oneof, Any, bytes)
  3. Adding inline documentation explaining the semantics

For example:

string label = 6;           // JSON-serialized label key-value pairs (format: {"key": "value", ...})
string annotation = 7;      // JSON-serialized annotations
string spec = 8;            // Raw Kubernetes spec object (JSON format)
string status = 9;          // Raw Kubernetes status object (JSON format)
string metadata = 10;       // Raw Kubernetes metadata object (JSON format)

924-933: Verify SearchK8sResources and SearchK8sWorkloads RPC completeness.

The three new RPC methods (SearchK8sResources, SearchK8sWorkloads, GetClusterType) reference request/response messages defined elsewhere in the file. Ensure all handler implementations properly support these new RPCs, especially given the search functionality which may have latency or pagination concerns not visible in the proto definitions.

Please verify:

  1. That downstream implementations (Go codegen) properly generate these RPC stubs
  2. That SearchK8sResources and SearchK8sWorkloads support timeout/deadline propagation (critical for search operations)
  3. Whether pagination is needed for these search endpoints (currently not present in SearchK8sResourcesRequest or SearchK8sWorkloadsRequest)
internal/proto/api/v1/recommendation.proto (4)

2146-2147: Verify HPAMetricType enum change does not break existing clients.

Lines 2146-2147 rename HPA_METRIC_TYPE_NETWORK (originally assigned value 4) to HPA_METRIC_TYPE_NETWORK_INGRESS (keeping value 4) and add HPA_METRIC_TYPE_NETWORK_EGRESS (value 5). This is a breaking change for API consumers that reference the enum by name (e.g., language bindings, client code using string-based enum names).

While the wire format remains compatible (numeric values are preserved), existing code like if metric_type == HPA_METRIC_TYPE_NETWORK will no longer compile/work.

Please confirm:

  1. Is this enum rename intentional and expected by all clients?
  2. Have you coordinated with teams consuming this enum to handle the migration?
  3. Should a deprecation notice or migration guide be provided to downstream consumers?

Consider adding a deprecation marker or planning a phased rollout:

HPA_METRIC_TYPE_NETWORK = 4 [deprecated = true]; // Use HPA_METRIC_TYPE_NETWORK_INGRESS or HPA_METRIC_TYPE_NETWORK_EGRESS instead
HPA_METRIC_TYPE_NETWORK_INGRESS = 4;  // Candidate for alias in proto3
HPA_METRIC_TYPE_NETWORK_EGRESS = 5;

(Note: Proto3 doesn't support enum aliases, but consider the communication strategy.)


1775-1838: Clarify relationship between SearchResourcesByTargetingFilters RPC and k8s.proto search functionality.

Lines 1775-1838 introduce SearchResourcesByTargetingFilters in recommendation.proto, which appears to duplicate search functionality already available in k8s.proto via SearchK8sResources and SearchK8sWorkloads (added to K8SService). Both offer resource searching with filtering capabilities.

This overlap raises questions:

  1. Is the duplication intentional (e.g., different semantics or use cases)?
  2. Should one be preferred over the other?
  3. Is there risk of inconsistent implementations or confusion for API consumers?

Please clarify:

  • The intended distinction between SearchResourcesByTargetingFilters (recommendation service) and SearchK8sResources/SearchK8sWorkloads (k8s service)
  • Whether both should coexist or if one is transitional/deprecated
  • Any difference in filtering semantics (e.g., does TargetFilters differ from workload filters in k8s.proto?)

Consider adding documentation comment:

// SearchResourcesByTargetingFilters searches for resources matching targeting criteria.
// Note: Similar functionality also available via K8SService.SearchK8sResources/SearchK8sWorkloads.
// Use this when applying recommendation policies; use K8SService methods for general resource discovery.

117-128: Approve node policy and instance configuration RPCs.

The addition of comprehensive node policy management RPCs (CreateNodePolicies, ListNodePolicies, UpdateNodePolicy, SuggestedNodePolicy) and instance configuration RPCs (GetInstanceFamilies, GetInstanceSizes) provides good coverage for the new Karpenter-based node policy feature. The request/response message definitions are complete and well-structured.


1501-1506: LGTM: LiveMigDeletionPropagation enum.

The new LiveMigDeletionPropagation enum provides clear semantics for live migration deletion behavior (ORPHAN, BACKGROUND, FOREGROUND), aligning with Kubernetes pod deletion propagation policies. No concerns.

Comment on lines +1461 to +1465
message EventDatapoint {
string utc_time = 1;
int32 count = 2; // Number of events in this time bucket
repeated EventDatapointInfo events = 3; // Details of events in this bucket
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Use google.protobuf.Timestamp for utc_time in EventDatapoint.

Line 1462 defines utc_time as a string type, but the rest of the proto file (e.g., Event, CostDataPoint) uses google.protobuf.Timestamp for timestamp fields. Using a string is error-prone and reduces type safety; timestamps should use the protobuf Timestamp type for consistency and better interoperability.

Consider updating to use the standard Timestamp type:

- string utc_time = 1;
+ google.protobuf.Timestamp utc_time = 1;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
message EventDatapoint {
string utc_time = 1;
int32 count = 2; // Number of events in this time bucket
repeated EventDatapointInfo events = 3; // Details of events in this bucket
}
message EventDatapoint {
google.protobuf.Timestamp utc_time = 1;
int32 count = 2; // Number of events in this time bucket
repeated EventDatapointInfo events = 3; // Details of events in this bucket
}
🤖 Prompt for AI Agents
In internal/proto/api/v1/common.proto around lines 1461 to 1465, the
EventDatapoint message declares utc_time as a string which is inconsistent with
other messages and error-prone; change the utc_time field to use
google.protobuf.Timestamp (google.protobuf.Timestamp utc_time = 1) and ensure
the file imports "google/protobuf/timestamp.proto" at the top if not already
present, update any comments to reflect it is a Timestamp, and regenerate any
language-specific protobuf artifacts that rely on this message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants