Plan / Spec Mode #7355

etraut-openai · 2025-11-27T21:54:33Z

etraut-openai
Nov 27, 2025
Maintainer

The Codex team has received many requests for some form of a planning or spec’ing mode. We’ve been experimenting with several concepts internally, and before we finalize a design, we’d love to get additional input from the Codex community.

There are now many tools in the agentic coding space that explore different approaches to planning and spec’ing. It’s great to see this breadth of experimentation. We’re actively learning from these approaches as we work toward a uniquely Codex solution — one that takes full advantage of the newest models and integrates naturally into real-world workflows.

Below are some of the key design questions we’re currently exploring.

Prescribed or flexible workflow?

Some tools take a strong stance on spec-driven development (SDD) and enforce fairly rigid workflows. Others emphasize flexibility and customization. Prescribed workflows can work well when they align with your preferences, but they can also be limiting. Different teams, projects, and individual developers prefer different workflows, and even within a single project, different tasks call for different levels of planning or formal spec’ing. We’re inclined to favor flexibility — but we’re curious how others see this tradeoff.

Modal or non-modal?

Some tools provide a distinct “planning mode” that you explicitly enter via a button, menu item, or command. This has real advantages: it clearly signals intent to the model, and it allows the harness to enforce safeguards (for example, preventing code changes during planning).

However, hard modes also introduce friction. You have to remember which mode you’re in, when to switch, and why. We want Codex to feel like an intelligent collaborator. When you’re working with a human colleague, you don’t explicitly put them into a “planning mode” to brainstorm — the intent is usually implicit. For this reason, we’re currently leaning toward a solution that avoids a hard modal switch, but we’d love to hear your perspective.

Ephemeral or persisted?

Many SDD-oriented tools encourage plans or specs to be persisted as files and checked into the repository alongside code. We think persisted specs will become increasingly common over time, but we also expect strong variation in personal and team preferences. Our goal is likely to support both persisted, file-based specs and lighter-weight, ephemeral plans that exist only for the duration of a task.

Interview-style or freeform?

Some tools guide planning through an interview-style, question-and-answer process. Others allow for more freeform discussion. Interviews can work well for smaller, well-scoped tasks. For larger features, architectural changes, or major refactors, a freeform approach often enables deeper exploration of design tradeoffs and alternatives. We’re exploring how (or whether) both can coexist.

Cursory or thorough?

In some cases, it’s ideal for the agent to quickly sketch an initial plan with minimal up-front analysis. In other cases — especially for large or complex changes — it’s valuable for the agent to think more deeply: exploring the codebase, understanding dependencies, and possibly even doing web-based research before proposing a plan or spec. How much “thinking” should be the default, and how much should be user-directed?

We’d love your input

If Codex were to add a “plan” or “spec” capability:

What would you want this feature to look like in practice?
What has worked (or failed) for you in other tools?
Do you prefer structured workflows, open-ended planning, or something in between?

Your feedback will directly influence our design. We’re excited to hear how you think planning in Codex should work.

etraut-openai · 2025-11-27T22:02:00Z

etraut-openai
Nov 27, 2025
Maintainer Author

Let me also share some of my own experimentation. Here's a snippet from my AGENTS.md file that supports both a lightweight "plan" mode and a more formalized "spec" mode. These are invoked simply by including "Create a plan" or "Create a spec" in the prompt. I tend to use plans for simple features and specs for more complex features, major refactors, and new projects.

## Plans

If the user specifically asks for a plan, present the summary of a plan and wait for user
confirmation before proceeding with any code modifications. Do not do extensive research
prior to presenting the plan. Succinctly summarize the task and the main requirements
using informal but terse language (no need to use RFC 2119 modal verbs). If the task is
clear from the prompt, ask the user if the plan looks good before proceeding. If there are
aspects of the plan that require clarification or there are design tradeoffs, ask the user up
to four questions. In cases where there is a clear choice between two or three options, phrase
the question as multiple choice so the user can simply reply with A, B, C, etc. Do not modify
any code until the user tells you that the plan is acceptable.

## Specs

If the user specifically asks for a spec, make sure that a spec exists for the task before
proceeding with detailed planning. If no spec exists for the feature already, create a new
one. Do not do extensive planning or research first. Instead, create a basic spec template
with placeholders.

Do not create a spec if the user doesn't ask for one.

Use the following rules for specs:

- Specs should be written in markdown.
- Specs should be concise, including only critical information to capture intent,
   requirements, and high-level design decisions
- A `/specs` directory at the root of the project should contain specs for features. If this
   directory doesn't exist, create it. Do not place specs in the `/docs` directory unless explicitly
   told to do so.
- Within the `/specs` directory, subdirectories represent features or feature areas. Each directory
   contains one or more "md" files that contain the specification details. Each directory contains a
    single "spec.md" file.
- A complete spec contains: 1. Overview, which is a succinct description of the feature and the
   motivation behind it, 2. Requirements, which capture the intent and user journey, and 3. Design,
   which provides a high-level technical design considerations including architecture, standards,
   frameworks, and external dependencies. Do not include extra sections. Use bulleted lists in
   each section and be concise.
- Requirements should be listed as declarative statements that use RFC 2119 modal verbs
   (MUST, SHOULD, MAY) to express normative strength.
- For the initial spec, do not do extensive code exploration prior to generating the spec.
- When creating a new spec, questions for the user can be added to the bottom of the file in a
   section named "Open Questions".

After creating the spec, ask the user to review it. Proceed with implementing the spec only once
the user confirms that it is complete and correct.

1 reply

diwu-sf Nov 27, 2025

Are you able to ship a few alpha builds for people to try out and give feedback on? Like just provide a few variants and people can switch between them to get the vibes / give real usage?

These kind of feature is hard to dictate up front without something to try out on actual tasks.

joshwhiton · 2025-11-27T22:54:37Z

joshwhiton
Nov 27, 2025

Your critique of "hard modes" is valid. And as much as it's what I've been asking for, it's true that I'm only asking for it because I don't trust Codex to just chill out when I'm asking questions. If you can keep codex from running off and coding when I'm just asking for clarification or to think something through with me, then I'd love to never have to manage an explicit, hard-enforced "plan mode".

0 replies

bhack · 2025-11-27T22:57:09Z

bhack
Nov 27, 2025

As every specification or plan can contain areas of high complexity or ambiguity, it would be helpful to support interactive pre and real-time questioning around those high-perplexity points, inspired by the approach used in the recent shopping model.

0 replies

bhack · 2025-11-27T23:06:05Z

bhack
Nov 27, 2025

I believe the planning phase should include stronger web-access querying abilities, as plans often need to research best practices, compare alternative approaches, or explore unfamiliar or newly emerging topics that require deeper investigation not strongly confined to the cut-off dare.

0 replies

asasidh · 2025-11-28T03:38:42Z

asasidh
Nov 28, 2025

I'm a big fan of the spec driven programming ever since it was introduced into vibecoding tools through AWS Kiro and later GitHub SpecKit etc. When it comes to enterprises, we need to provide a lot more flexibility. One way to add flexibility is to allow the creation of a markdown file that has all the constraints of the stages, expected output, etc. To specifically answer your questions raised above

Prescribed or flexible workflow? Prescribed
Modal or non-modal? non-modal. perhaps like a planner mode drop down ?
Ephemeral or persisted? persisted - perhaps this can be a separate folder and user can decide to not check in etc.
Interview-style or freeform? interview style may be better. freeform markdown file should be an option to custom specify what settings you need.
Cursory or thorough? thorough

0 replies

coygeek · 2025-11-28T07:38:41Z

coygeek
Nov 28, 2025

Thanks for opening this discussion. I’ve been using Claude Code extensively, and their current implementation of Plan Mode effectively solves the design trade-offs you mentioned (Modal vs. Non-modal and Interview-style).

Here is what works specifically well in that implementation that Codex should consider:

1. Modal vs. Non-modal: The argument for a "Low-Friction Hard Mode"
You mentioned that hard modes introduce friction, but Claude Code solves this with UX. You simply hit Shift+Tab in the CLI (or click a toggle in the VS Code extension) to switch modes instantly.

Intent Signaling: This explicit switch is crucial. It tells the model, "We are thinking now, not coding."
Safety (The "Bonus"): The best part of this distinct mode is that it enforces a read-only state. Files are not written to disk while in Plan Mode. This allows for safe architectural exploration without the fear of the agent prematurely overwriting code or breaking the build before the design is settled.

2. Interview-style vs. Freeform: It should be adaptive
Regarding your question on whether planning should be an interview or freeform—Claude Code demonstrates that it shouldn't be a rigid user setting, but rather an emergent behavior of the model based on complexity.

Adaptive Behavior: Part of their system prompt logic seems to dynamically assess ambiguity. If I ask a simple question, it just plans and executes. However, for complex tasks, it triggers an AskUserQuestion tool flow to clarify requirements before generating a plan.
Why this works: It prevents "interview fatigue" on small tasks while ensuring correctness on large refactors. It feels like a natural collaboration rather than a "wizard" I have to slog through.

In short: A distinct mode (for safety/no-write guarantees) combined with a low-friction toggle (Shift+Tab) and adaptive questioning seems to be the sweet spot for developer experience.

0 replies

ZombieHarvester · 2025-11-28T16:03:19Z

ZombieHarvester
Nov 28, 2025

Modal or non-modal

When you’re working with a human colleague, you don’t explicitly put them into a “planning mode” to brainstorm – the intent is usually implicit.

When collaborating with a human colleague, there are many verbal and non-verbal cues that convey intent, far beyond what a CLI allows to express. It’s very frustrating when a question is interpreted as a passive-aggressive request and the model rushes to make changes instead of providing a clear answer. I’m not sure an implicit mode can reliably guarantee that questions lead to answers rather than actions, and having to add “investigate, don’t make changes” every time is quite annoying.

Ephemeral or persisted?

It would be nice to have persistence and progress tracking as an option. It usually takes a few iterations on a plan before committing to it and I only save it to a file if it will require more than one session to implement.

Interview-style or freeform?

It is very helpful when a model asks questions before starting on a plan or implementation. There may be edge cases I haven’t considered or incomplete / ambiguous instructions. It’s much better to clarify things upfront than to fix results afterward. I used to add “ask me questions one by one if you need additional context” at the end of my prompts.

I believe both interview-style and freeform have their place: interview-style is a much more compact way to gather context, but it should be flexible enough to switch to freeform, since I might not know the answer, or might want to explore more options before committing to one.

Cursory or thorough?

I’d love to be able to explicitly trigger a thorough mode when I feel the result is important enough to warrant it.

0 replies

jzillmann · 2025-11-29T01:42:19Z

jzillmann
Nov 29, 2025

Main use cases right now for me are:

let me verify first
- e.g. I don't know the solution space well and want some options/explanations
- or just want to double check, cause not feeling trust right now
i'v already have an agent hammering changes into the project, so please inform yourself, make a plan you might be the next one to go

0 replies

laiso · 2025-12-02T11:25:26Z

laiso
Dec 2, 2025

I place importance on a flexible planning file workflow that allows users to take an active role in editing, rather than relying too heavily on AI. Specifically, I envision an approach similar to the PLANS.md file style seen in the OpenAI cookbook, though with a slightly different purpose. I prefer a style where a Markdown file is created for each task and iteratively co-edited with the AI. It is essential to have the ability to directly revise and restructure AI-generated plans by hand, rather than simply accepting them as-is.

In comparing other tools, I find approaches like Cline’s Deep Planning and Claude Code more aligned with my preferences. Claude Code offers a Plan mode and, notably, allows users to directly edit and save the generated planning file—a feature I highly value. While I don’t consider a dedicated Plan mode strictly necessary, the ability to externalize plans into editable, savable files is a significant advantage.

On the other hand, tools such as Cursor's Planning, Kiro, and Antigravity feel too AI-driven. With these, the inability to make direct edits—having to route every revision through the AI—becomes cumbersome. Constraints that prevent users from freely editing files hinder flexible workflows.

In summary, I prioritize three key capabilities: (1) creating Markdown files per task, (2) enabling collaborative editing with AI, and (3) allowing users to directly edit files themselves—eliminating the inconvenience of having to correct every detail via the AI.

0 replies

blackirishmec · 2025-12-02T23:43:42Z

blackirishmec
Dec 2, 2025

Right now if I simulate this behavior w/ the Sequential Thinking MCP server I notice that the quality of Codex's work goes down dramatically compared to a version where Codex requests the user to respond with 'continue' between steps of a plan in a custom prompt. Ideally this plan mode would have Codex use the manual-interaction level of focus for each plan stage, rather than the terse version seen when integrating w/ Sequential Thinking.

2 replies

blackirishmec Dec 2, 2025

Without plan mode, this leaves me to structuring custom prompts and having to pay attention to my CLI to respond with 'yes' 5 times across a 5 step plan, which is tedious and doesn't allow me to focus elsewhere.

etraut-openai Dec 2, 2025
Maintainer Author

The latest models are generally able to perform large, complex, multi-step tasks without supervision. I'm curious why you want to stop after each stage of the plan. Is this more for your own edification/curiosity or do you find that it's necessary to get the agent to stick to the plan? What happens if you just let it go without stopping for intermediate approvals?

blackirishmec · 2025-12-03T01:21:35Z

blackirishmec
Dec 3, 2025

It completely comes down to quality. When I use the manual interaction steps, I feel like it gives a ton more effort into each step, gated by manual interaction. When I either have it do everything in one go or use sequential reasoning MCP to avoid manual interaction steps, the quality (attention and time given) per step drops radically.On Dec 2, 2025, at 3:51 PM, Eric Traut ***@***.***> wrote: The latest models are generally able to perform large, complex, multi-step tasks without supervision. I'm curious why you want to stop after each stage of the plan. Is this more for your own edification/curiosity or do you find that it's necessary to get the agent to stick to the plan? What happens if you just let it go without stopping for intermediate approvals? —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>

0 replies

Plan / Spec Mode #7355

Uh oh!

Uh oh!

etraut-openai Nov 27, 2025 Maintainer

Replies: 11 comments · 3 replies

Uh oh!

Uh oh!

etraut-openai Nov 27, 2025 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

etraut-openai Dec 2, 2025 Maintainer Author

Uh oh!

etraut-openai
Nov 27, 2025
Maintainer

Replies: 11 comments 3 replies

etraut-openai
Nov 27, 2025
Maintainer Author

etraut-openai Dec 2, 2025
Maintainer Author