-
Notifications
You must be signed in to change notification settings - Fork 2.1k
feat(spec): add optional inputFields and outputFields and to AgentSkill
#814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @inesmcm26, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request enhances the Agent Skill model by adding optional input and output fields, allowing agents to define structured data formats beyond simple mime types. This addition enables clients to build more precise interfaces, improve validation, and support complex skills with multiple inputs and outputs.
Highlights
- Feature: AgentSkill Input/Output Fields: Introduces optional
inputFieldsandoutputFieldsto theAgentSkillmodel, enabling structured, named input and output field definitions with type and schema information. - New: FieldDefinition Model: Defines a
FieldDefinitionmodel with attributes likename,kind(text,file,data),mimeTypes,schema,description, andoptionalto provide detailed descriptions of input and output fields. - Enhancement: Documentation: Updates documentation to reflect the new
inputFieldsandoutputFieldsinAgentSkilland introduces theFieldDefinitionobject. - Update: Examples: Updates the example AgentSkill definitions to include inputFields and outputFields to demonstrate the new feature.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
2959a0a to
f2b46bd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces inputFields and outputFields to the AgentSkill model, providing a way to define structured inputs and outputs. My review identified a critical issue in the protobuf definition which is missing the new FieldDefinition message, and a high/medium severity issue in the documentation, mainly concerning an incorrect example. Addressing these will improve the clarity and correctness of the specification.
0c16f8e to
58f736d
Compare
58f736d to
ab8597a
Compare
|
Makes sense to me, @kthota-g can you review? |
|
Currently examples is the mechanism to hint to the client about possible input types expected. An example is here - https://a2aproject.github.io/A2A/latest/specification/#56-sample-agent-card. If there are any missing attributes for the agent to complete the task, the agent can respond back with input-required to negotiate the required inputs. Also, different clients can request the output in varying formats and the clients can specify the schema for the output - https://a2aproject.github.io/A2A/latest/specification/#97-structured-data-exchange-requesting-and-providing-json |
|
Could the message Any {
// A URL/resource name that uniquely identifies the type of the serialized
// protocol buffer message. This string must contain at least
// one "/" character. The last segment of the URL's path must represent
// the fully qualified name of the type (as in
// `path/google.protobuf.Duration`). The name should be in a canonical form
// (e.g., leading "." is not accepted).
//
// In practice, teams usually precompile into the binary all types that they
// expect it to use in the context of Any. However, for URLs which use the
// scheme `http`, `https`, or no scheme, one can optionally set up a type
// server that maps type URLs to message definitions as follows:
//
// * If no scheme is provided, `https` is assumed.
// * An HTTP GET on the URL must yield a [google.protobuf.Type][]
// value in binary format, or produce an error.
// * Applications are allowed to cache lookup results based on the
// URL, or have them precompiled into a binary to avoid any
// lookup. Therefore, binary compatibility needs to be preserved
// on changes to types. (Use versioned type names to manage
// breaking changes.)
//
// Note: this functionality is not currently available in the official
// protobuf release, and it is not used for type URLs beginning with
// type.googleapis.com. As of May 2023, there are no widely used type server
// implementations and no plans to implement one.
//
// Schemes other than `http`, `https` (or the empty scheme) might be
// used with implementation specific semantics.
//
string type_url = 1;
// Must be a valid serialized protocol buffer of the above specified type.
bytes value = 2;
}Inline with how Google uses common (well-known and common) protos to communicate well understood data structures ( And then extending the Part definition to simplify the handling of these: // Part represents a container for a section of communication content.
// Parts can be purely textual, some sort of file (image, video, etc) or
// a structured data blob (i.e. JSON).
message Part {
oneof part {
string text = 1;
FilePart file = 2;
DataPart data = 3;
}
}
// FilePart represents the different ways files can be provided. If files are
// small, directly feeding the bytes is supported via file_with_bytes. If the
// file is large, the agent should read the content as appropriate directly
// from the file_with_uri source.
message FilePart {
oneof file {
string file_with_uri = 1;
bytes file_with_bytes = 2;
}
string mime_type = 3;
}
// DataPart represents a structured blob. This is most commonly a JSON payload.
message DataPart {
google.protobuf.Struct data = 1;
google.protobuf.Any input = 2;
google.protobuf.Any output = 3;
}Or even this: // DataPart represents a structured object. This is most commonly a JSON payload.
message DataPart {
google.protobuf.Any data = 1;
} |
|
While I agree that there is some opportunity for improvement in how inputs and outputs of skills can be described, I think the current approach of providing media type identifiers and examples are a good compromise. Trying to define a new schema language for input parts is a non-trivial task and one that no LLM is going to understand implicitly. As crude as it appears, the use of examples is quite effective with LLMs. I sympathize with the desire here, but I don't think the proposed solution will make the situation better. It does however, raise the question of whether there should be a metadata property in skills so that extensions could be defined to enable more strictly controlled interactions. |
Description
This PR introduces two new optional fields to the
AgentSkillmodel -inputFieldsandoutputFields- allowing agents to define structured, named input and output fields with detailed type and schema information. These fields complement the existinginputModesandoutputModes, which specify supported mime types.Problem
Currently, agent skills declare only
inputModesandoutputModesas lists of mime types. While this indicates supported data formats (e.g.,"text/plain","image/png"), it does not convey the structure or semantics of the input/output data, such as named fields or their types. In some use cases, this makes it difficult for clients to:Solution
This PR adds a new optional layer by defining a
FieldDefinitionmodel, which lets agents declare:text,file,data)The
AgentSkillmodel is extended with two new optional lists:inputFieldsandoutputFields. These provide a semantic, machine-readable description of the skill interface, while retaining backward compatibility by keepinginputModesandoutputModes.Benefits
Closes #813
Idea emerged from discussion in: #462