Plan / Spec Mode #7355
Replies: 11 comments 3 replies
-
|
Let me also share some of my own experimentation. Here's a snippet from my |
Beta Was this translation helpful? Give feedback.
-
|
Your critique of "hard modes" is valid. And as much as it's what I've been asking for, it's true that I'm only asking for it because I don't trust Codex to just chill out when I'm asking questions. If you can keep codex from running off and coding when I'm just asking for clarification or to think something through with me, then I'd love to never have to manage an explicit, hard-enforced "plan mode". |
Beta Was this translation helpful? Give feedback.
-
|
As every specification or plan can contain areas of high complexity or ambiguity, it would be helpful to support interactive pre and real-time questioning around those high-perplexity points, inspired by the approach used in the recent shopping model. |
Beta Was this translation helpful? Give feedback.
-
|
I believe the planning phase should include stronger web-access querying abilities, as plans often need to research best practices, compare alternative approaches, or explore unfamiliar or newly emerging topics that require deeper investigation not strongly confined to the cut-off dare. |
Beta Was this translation helpful? Give feedback.
-
|
I'm a big fan of the spec driven programming ever since it was introduced into vibecoding tools through AWS Kiro and later GitHub SpecKit etc. When it comes to enterprises, we need to provide a lot more flexibility. One way to add flexibility is to allow the creation of a markdown file that has all the constraints of the stages, expected output, etc. To specifically answer your questions raised above Prescribed or flexible workflow? Prescribed |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for opening this discussion. I’ve been using Claude Code extensively, and their current implementation of Plan Mode effectively solves the design trade-offs you mentioned (Modal vs. Non-modal and Interview-style). Here is what works specifically well in that implementation that Codex should consider: 1. Modal vs. Non-modal: The argument for a "Low-Friction Hard Mode"
2. Interview-style vs. Freeform: It should be adaptive
In short: A distinct mode (for safety/no-write guarantees) combined with a low-friction toggle (Shift+Tab) and adaptive questioning seems to be the sweet spot for developer experience. |
Beta Was this translation helpful? Give feedback.
-
|
Modal or non-modal
When collaborating with a human colleague, there are many verbal and non-verbal cues that convey intent, far beyond what a CLI allows to express. It’s very frustrating when a question is interpreted as a passive-aggressive request and the model rushes to make changes instead of providing a clear answer. I’m not sure an implicit mode can reliably guarantee that questions lead to answers rather than actions, and having to add “investigate, don’t make changes” every time is quite annoying. Ephemeral or persisted? It would be nice to have persistence and progress tracking as an option. It usually takes a few iterations on a plan before committing to it and I only save it to a file if it will require more than one session to implement. Interview-style or freeform? It is very helpful when a model asks questions before starting on a plan or implementation. There may be edge cases I haven’t considered or incomplete / ambiguous instructions. It’s much better to clarify things upfront than to fix results afterward. I used to add “ask me questions one by one if you need additional context” at the end of my prompts. I believe both interview-style and freeform have their place: interview-style is a much more compact way to gather context, but it should be flexible enough to switch to freeform, since I might not know the answer, or might want to explore more options before committing to one. Cursory or thorough? I’d love to be able to explicitly trigger a thorough mode when I feel the result is important enough to warrant it. |
Beta Was this translation helpful? Give feedback.
-
|
Main use cases right now for me are:
|
Beta Was this translation helpful? Give feedback.
-
|
I place importance on a flexible planning file workflow that allows users to take an active role in editing, rather than relying too heavily on AI. Specifically, I envision an approach similar to the PLANS.md file style seen in the OpenAI cookbook, though with a slightly different purpose. I prefer a style where a Markdown file is created for each task and iteratively co-edited with the AI. It is essential to have the ability to directly revise and restructure AI-generated plans by hand, rather than simply accepting them as-is. In comparing other tools, I find approaches like Cline’s Deep Planning and Claude Code more aligned with my preferences. Claude Code offers a Plan mode and, notably, allows users to directly edit and save the generated planning file—a feature I highly value. While I don’t consider a dedicated Plan mode strictly necessary, the ability to externalize plans into editable, savable files is a significant advantage. On the other hand, tools such as Cursor's Planning, Kiro, and Antigravity feel too AI-driven. With these, the inability to make direct edits—having to route every revision through the AI—becomes cumbersome. Constraints that prevent users from freely editing files hinder flexible workflows. In summary, I prioritize three key capabilities: (1) creating Markdown files per task, (2) enabling collaborative editing with AI, and (3) allowing users to directly edit files themselves—eliminating the inconvenience of having to correct every detail via the AI. |
Beta Was this translation helpful? Give feedback.
-
|
Right now if I simulate this behavior w/ the Sequential Thinking MCP server I notice that the quality of Codex's work goes down dramatically compared to a version where Codex requests the user to respond with 'continue' between steps of a plan in a custom prompt. Ideally this plan mode would have Codex use the manual-interaction level of focus for each plan stage, rather than the terse version seen when integrating w/ Sequential Thinking. |
Beta Was this translation helpful? Give feedback.
-
|
It completely comes down to quality. When I use the manual interaction steps, I feel like it gives a ton more effort into each step, gated by manual interaction. When I either have it do everything in one go or use sequential reasoning MCP to avoid manual interaction steps, the quality (attention and time given) per step drops radically.On Dec 2, 2025, at 3:51 PM, Eric Traut ***@***.***> wrote:
The latest models are generally able to perform large, complex, multi-step tasks without supervision. I'm curious why you want to stop after each stage of the plan. Is this more for your own edification/curiosity or do you find that it's necessary to get the agent to stick to the plan? What happens if you just let it go without stopping for intermediate approvals?
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
The Codex team has received many requests for some form of a planning or spec’ing mode. We’ve been experimenting with several concepts internally, and before we finalize a design, we’d love to get additional input from the Codex community.
There are now many tools in the agentic coding space that explore different approaches to planning and spec’ing. It’s great to see this breadth of experimentation. We’re actively learning from these approaches as we work toward a uniquely Codex solution — one that takes full advantage of the newest models and integrates naturally into real-world workflows.
Below are some of the key design questions we’re currently exploring.
Prescribed or flexible workflow?
Some tools take a strong stance on spec-driven development (SDD) and enforce fairly rigid workflows. Others emphasize flexibility and customization. Prescribed workflows can work well when they align with your preferences, but they can also be limiting. Different teams, projects, and individual developers prefer different workflows, and even within a single project, different tasks call for different levels of planning or formal spec’ing. We’re inclined to favor flexibility — but we’re curious how others see this tradeoff.
Modal or non-modal?
Some tools provide a distinct “planning mode” that you explicitly enter via a button, menu item, or command. This has real advantages: it clearly signals intent to the model, and it allows the harness to enforce safeguards (for example, preventing code changes during planning).
However, hard modes also introduce friction. You have to remember which mode you’re in, when to switch, and why. We want Codex to feel like an intelligent collaborator. When you’re working with a human colleague, you don’t explicitly put them into a “planning mode” to brainstorm — the intent is usually implicit. For this reason, we’re currently leaning toward a solution that avoids a hard modal switch, but we’d love to hear your perspective.
Ephemeral or persisted?
Many SDD-oriented tools encourage plans or specs to be persisted as files and checked into the repository alongside code. We think persisted specs will become increasingly common over time, but we also expect strong variation in personal and team preferences. Our goal is likely to support both persisted, file-based specs and lighter-weight, ephemeral plans that exist only for the duration of a task.
Interview-style or freeform?
Some tools guide planning through an interview-style, question-and-answer process. Others allow for more freeform discussion. Interviews can work well for smaller, well-scoped tasks. For larger features, architectural changes, or major refactors, a freeform approach often enables deeper exploration of design tradeoffs and alternatives. We’re exploring how (or whether) both can coexist.
Cursory or thorough?
In some cases, it’s ideal for the agent to quickly sketch an initial plan with minimal up-front analysis. In other cases — especially for large or complex changes — it’s valuable for the agent to think more deeply: exploring the codebase, understanding dependencies, and possibly even doing web-based research before proposing a plan or spec. How much “thinking” should be the default, and how much should be user-directed?
We’d love your input
If Codex were to add a “plan” or “spec” capability:
Your feedback will directly influence our design. We’re excited to hear how you think planning in Codex should work.
Beta Was this translation helpful? Give feedback.
All reactions