Skip to content

Conversation

@inesmcm26
Copy link

@inesmcm26 inesmcm26 commented Jun 30, 2025

Description

This PR introduces two new optional fields to the AgentSkill model - inputFields and outputFields - allowing agents to define structured, named input and output fields with detailed type and schema information. These fields complement the existing inputModes and outputModes, which specify supported mime types.

Problem

Currently, agent skills declare only inputModes and outputModes as lists of mime types. While this indicates supported data formats (e.g., "text/plain", "image/png"), it does not convey the structure or semantics of the input/output data, such as named fields or their types. In some use cases, this makes it difficult for clients to:

  • Understand what specific inputs an agent expects or outputs it produces
  • Properly validate, log, or debug message contents
  • Create user interfaces with meaningful form fields to invoke an agent
  • Handle multiple fields of the same type in a structured way

Solution

This PR adds a new optional layer by defining a FieldDefinition model, which lets agents declare:

  • Named input and output fields that are expected to be received and returned by a skill
  • Field kinds (text, file, data)
  • Supported mime types for each field
  • Optional JSON schemas for structured data fields
  • Field descriptions and optionality flags

The AgentSkill model is extended with two new optional lists: inputFields and outputFields. These provide a semantic, machine-readable description of the skill interface, while retaining backward compatibility by keeping inputModes and outputModes.

Benefits

  • Enables clients to build more precise and user-friendly input forms
  • Improves clarity and reduces ambiguity in message composition
  • Facilitates validation, logging, and debugging with schema support
  • Supports complex skills with multiple inputs/outputs of the same type
  • Provides a semantic layer that builds upon existing mime type declarations

Closes #813

Idea emerged from discussion in: #462

@inesmcm26 inesmcm26 requested a review from a team as a code owner June 30, 2025 22:12
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @inesmcm26, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the Agent Skill model by adding optional input and output fields, allowing agents to define structured data formats beyond simple mime types. This addition enables clients to build more precise interfaces, improve validation, and support complex skills with multiple inputs and outputs.

Highlights

  • Feature: AgentSkill Input/Output Fields: Introduces optional inputFields and outputFields to the AgentSkill model, enabling structured, named input and output field definitions with type and schema information.
  • New: FieldDefinition Model: Defines a FieldDefinition model with attributes like name, kind (text, file, data), mimeTypes, schema, description, and optional to provide detailed descriptions of input and output fields.
  • Enhancement: Documentation: Updates documentation to reflect the new inputFields and outputFields in AgentSkill and introduces the FieldDefinition object.
  • Update: Examples: Updates the example AgentSkill definitions to include inputFields and outputFields to demonstrate the new feature.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@inesmcm26 inesmcm26 force-pushed the feat/structured-inout-fields branch from 2959a0a to f2b46bd Compare June 30, 2025 22:13
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces inputFields and outputFields to the AgentSkill model, providing a way to define structured inputs and outputs. My review identified a critical issue in the protobuf definition which is missing the new FieldDefinition message, and a high/medium severity issue in the documentation, mainly concerning an incorrect example. Addressing these will improve the clarity and correctness of the specification.

@inesmcm26 inesmcm26 force-pushed the feat/structured-inout-fields branch 2 times, most recently from 0c16f8e to 58f736d Compare July 1, 2025 09:25
@inesmcm26 inesmcm26 force-pushed the feat/structured-inout-fields branch from 58f736d to ab8597a Compare July 1, 2025 09:26
@holtskinner
Copy link
Member

Makes sense to me, @kthota-g can you review?

@kthota-g
Copy link
Collaborator

kthota-g commented Jul 1, 2025

Currently examples is the mechanism to hint to the client about possible input types expected. An example is here - https://a2aproject.github.io/A2A/latest/specification/#56-sample-agent-card. If there are any missing attributes for the agent to complete the task, the agent can respond back with input-required to negotiate the required inputs.

Also, different clients can request the output in varying formats and the clients can specify the schema for the output - https://a2aproject.github.io/A2A/latest/specification/#97-structured-data-exchange-requesting-and-providing-json

@jankrynauw
Copy link

jankrynauw commented Aug 4, 2025

Could the google.protobuf.Any (definition) type not play an elegant role here?

message Any {
  // A URL/resource name that uniquely identifies the type of the serialized
  // protocol buffer message. This string must contain at least
  // one "/" character. The last segment of the URL's path must represent
  // the fully qualified name of the type (as in
  // `path/google.protobuf.Duration`). The name should be in a canonical form
  // (e.g., leading "." is not accepted).
  //
  // In practice, teams usually precompile into the binary all types that they
  // expect it to use in the context of Any. However, for URLs which use the
  // scheme `http`, `https`, or no scheme, one can optionally set up a type
  // server that maps type URLs to message definitions as follows:
  //
  // * If no scheme is provided, `https` is assumed.
  // * An HTTP GET on the URL must yield a [google.protobuf.Type][]
  //   value in binary format, or produce an error.
  // * Applications are allowed to cache lookup results based on the
  //   URL, or have them precompiled into a binary to avoid any
  //   lookup. Therefore, binary compatibility needs to be preserved
  //   on changes to types. (Use versioned type names to manage
  //   breaking changes.)
  //
  // Note: this functionality is not currently available in the official
  // protobuf release, and it is not used for type URLs beginning with
  // type.googleapis.com. As of May 2023, there are no widely used type server
  // implementations and no plans to implement one.
  //
  // Schemes other than `http`, `https` (or the empty scheme) might be
  // used with implementation specific semantics.
  //
  string type_url = 1;

  // Must be a valid serialized protocol buffer of the above specified type.
  bytes value = 2;
}

Inline with how Google uses common (well-known and common) protos to communicate well understood data structures (
google.type.Date, google.type.PostalAddress, google.protobuf.Timestamp, etc.), is there not a world where agent specific definitions (via the AgentSkill) are made available at a well-known endpoint?

And then extending the Part definition to simplify the handling of these:

// Part represents a container for a section of communication content.
// Parts can be purely textual, some sort of file (image, video, etc) or
// a structured data blob (i.e. JSON).
message Part {
  oneof part {
    string text = 1;
    FilePart file = 2;
    DataPart data = 3;
  }
}

// FilePart represents the different ways files can be provided. If files are
// small, directly feeding the bytes is supported via file_with_bytes. If the
// file is large, the agent should read the content as appropriate directly
// from the file_with_uri source.
message FilePart {
  oneof file {
    string file_with_uri = 1;
    bytes file_with_bytes = 2;
  }
  string mime_type = 3;
}

// DataPart represents a structured blob. This is most commonly a JSON payload.
message DataPart {
  google.protobuf.Struct data = 1;
  google.protobuf.Any input = 2; 
  google.protobuf.Any output = 3;
}

Or even this:

// DataPart represents a structured object. This is most commonly a JSON payload.
message DataPart {
  google.protobuf.Any data = 1;
}

@holtskinner holtskinner requested a review from a team as a code owner August 27, 2025 17:04
@darrelmiller
Copy link
Contributor

While I agree that there is some opportunity for improvement in how inputs and outputs of skills can be described, I think the current approach of providing media type identifiers and examples are a good compromise. Trying to define a new schema language for input parts is a non-trivial task and one that no LLM is going to understand implicitly. As crude as it appears, the use of examples is quite effective with LLMs.

I sympathize with the desire here, but I don't think the proposed solution will make the situation better. It does however, raise the question of whether there should be a metadata property in skills so that extensions could be defined to enable more strictly controlled interactions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feat]: Optional Structured Field Definitions for Agent Inputs and Outputs

6 participants