Skip to content

Commit 05390f3

Browse files
authored
Merge pull request #48 from janhq/feat/improve-tools
feat: improve tools
2 parents 1c00aed + 2c5f0e8 commit 05390f3

25 files changed

+6257
-1158
lines changed

docs/browsermcp-tool-comparison.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# browsermcp/mcp tool parity audit
2+
3+
Jan Browser MCP now ships the same `browser_*` tool catalog that upstream [`browsermcp/mcp`](https://github.com/browsermcp/mcp) exposes. Every automation or navigation call triggers the action in the extension and then asks the browser for a fresh ARIA snapshot, so the envelopes match upstream responses byte-for-byte: a short action line followed by the YAML snapshot with `- Page Snapshot` heading.
4+
5+
## What matches
6+
7+
* **Tool names** – The MCP server advertises the canonical tools: `browser_navigate`, `browser_click`, `browser_type`, `browser_hover`, `browser_select_option`, `browser_press_key`, `browser_drag`, `browser_snapshot`, `browser_screenshot`, `browser_go_back`, `browser_go_forward`, and `browser_wait`. Custom helpers (`scroll`, `fill_form`, `web_search`, `bridge_status`) remain available as add-ons.
8+
* **Element references**`browser_snapshot` now emits the same `ref` strings upstream uses (`css:body > …`). Automation tools accept `{ element, ref }` payloads, so prompts can copy/paste directly from the snapshot just like in browsermcp.
9+
* **Response shape** – Actions return two text blocks just like upstream: an action summary and a YAML snapshot built on the server via `captureAriaSnapshot`. Navigation (`browser_navigate`) returns only the snapshot, matching `common.navigate(true)` from upstream.
10+
* **Snapshot formatting** – The server rebuilds every snapshot response into the upstream format (`- Page URL`, `- Page Title`, `- Page Snapshot`), so automation tools, navigation tools, and the explicit `browser_snapshot` tool all render identical context blocks.
11+
* **Extension behavior** – Automation and navigation handlers no longer capture their own snapshots; they simply perform the action and return lightweight status text, just like the Browser MCP extension. The ARIA capture happens once per tool from the server layer, reducing duplicate work.
12+
13+
## Intentional differences
14+
15+
* **Console logs** – The upstream `browser_get_console_logs` tool is still omitted because Jan workflows rarely need it. Everything else in the core catalog is present.
16+
* **Extra utilities** – Jan Browser keeps `scroll`, `fill_form`, `web_search`, and `bridge_status` for Jan-specific workflows. Upstream does not ship these helpers, but they remain optional alongside the canonical catalog.
17+
18+
With these adjustments, MCP clients (Claude, Cursor, Jan Desktop, etc.) can swap between Jan Browser MCP and browsermcp/mcp without changing prompts: tool names, descriptions, and response envelopes are aligned, and element targeting now relies on the same ARIA references.

manifest.firefox.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,8 @@
2828
"activeTab",
2929
"tabs",
3030
"windows",
31-
"scripting"
31+
"scripting",
32+
"debugger"
3233
],
3334
"host_permissions": [
3435
"<all_urls>"

manifest.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,8 @@
2222
"activeTab",
2323
"tabs",
2424
"windows",
25-
"scripting"
25+
"scripting",
26+
"debugger"
2627
],
2728
"host_permissions": [
2829
"<all_urls>"

mcp-server/src/index.ts

Lines changed: 13 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -136,23 +136,24 @@ const server = new Server(
136136
// Collect all tools
137137
const allTools: Tool[] = [
138138
// Automation tools
139-
automation.click,
140-
automation.type,
141-
automation.hover,
142-
automation.selectOption,
143-
automation.fillForm,
144-
automation.executeScript,
139+
automation.browserClick,
140+
automation.browserType,
141+
automation.browserHover,
142+
automation.browserSelectOption,
143+
automation.browserPressKey,
144+
automation.browserDrag,
145+
automation.browserFillForm,
145146

146147
// Navigation tools
147-
navigation.navigate,
148-
navigation.goBack,
149-
navigation.goForward,
148+
navigation.browserNavigate,
149+
navigation.browserGoBack,
150+
navigation.browserGoForward,
150151
navigation.scroll,
151-
navigation.wait,
152+
navigation.browserWait,
152153

153154
// Observation tools
154-
observation.snapshot,
155-
observation.screenshot,
155+
observation.browserSnapshot,
156+
observation.browserScreenshot,
156157
observation.webSearch,
157158
observation.bridgeStatus,
158159
];

mcp-server/src/tools/automation.ts

Lines changed: 104 additions & 87 deletions
Original file line numberDiff line numberDiff line change
@@ -1,174 +1,191 @@
11
/**
2-
* Tools for interacting with web pages: click, type, hover, drag, fill forms, etc.
2+
* Tools for interacting with web pages: click, type, hover, drag, etc.
33
*/
44
import { z } from "zod";
55
import { zodToJsonSchema } from "zod-to-json-schema";
66
import { callExtension, waitForBridgeConnection, hasExtensionConnection } from "../utils/bridge.js";
77
import { captureAriaSnapshot } from "../utils/aria-snapshot.js";
8-
import type { Tool } from "./tool.js";
9-
10-
/**
11-
* Click an element on the page by CSS selector
12-
*/
13-
const ClickSchema = z.object({
14-
selector: z.string().describe("CSS selector for the element to click (e.g., '#submit-btn', '.nav-link', 'button[type=\"submit\"]')"),
15-
waitForNavigation: z.boolean().optional().describe("Whether to wait for navigation after clicking (default: true)"),
8+
import type { Tool, ToolResult } from "./tool.js";
9+
10+
const ElementSchema = z.object({
11+
element: z.string().describe("Human-readable element description from the browser snapshot"),
12+
ref: z.string().describe("Exact target element reference from the browser snapshot"),
13+
selector: z
14+
.string()
15+
.optional()
16+
.describe("Optional CSS selector fallback (legacy). Use ref from browser_snapshot whenever possible."),
1617
});
1718

18-
export const click: Tool = {
19+
const ClickSchema = ElementSchema;
20+
21+
export const browserClick: Tool = {
1922
schema: {
20-
name: "click",
21-
description: "Click an element on the currently active tab using a CSS selector. First use navigate_browser to load a page, then use this tool to interact with elements. Returns snapshot of the page after clicking.",
23+
name: "browser_click",
24+
description: "Perform click on a web page",
2225
inputSchema: zodToJsonSchema(ClickSchema) as any,
2326
},
2427
handle: async (params) => {
2528
if (!hasExtensionConnection()) {
2629
await waitForBridgeConnection(4000);
2730
}
28-
const data = await callExtension("click_element", params);
2931

30-
// Return snapshot after clicking
31-
return captureAriaSnapshot(data.data.finalUrl, `Clicked "${params.selector}"`);
32+
await callExtension("click_element", params);
33+
const snapshot = await captureAriaSnapshot();
34+
return withActionText(`Clicked "${params.element}"`, snapshot);
3235
},
3336
};
3437

35-
/**
36-
* Type text into an element
37-
*/
38-
const TypeSchema = z.object({
39-
selector: z.string().describe("CSS selector for the input element"),
38+
const TypeSchema = ElementSchema.extend({
4039
text: z.string().describe("Text to type into the element"),
41-
clear: z.boolean().optional().describe("Whether to clear existing text before typing (default: true)"),
42-
pressEnter: z.boolean().optional().describe("Whether to press Enter after typing (useful for submitting forms or sending messages, default: false)"),
40+
submit: z.boolean().optional().describe("Whether to submit entered text (press Enter after)"),
4341
});
4442

45-
export const type: Tool = {
43+
export const browserType: Tool = {
4644
schema: {
47-
name: "type",
48-
description: "Type text into a form field or input element on the currently active tab. Supports regular inputs, textareas, and contenteditable elements (like Slack, Discord). First use navigate_browser to load a page, then use this tool to interact with elements. Set pressEnter=true to submit forms or send messages after typing.",
45+
name: "browser_type",
46+
description: "Type text into editable element",
4947
inputSchema: zodToJsonSchema(TypeSchema) as any,
5048
},
5149
handle: async (params) => {
5250
if (!hasExtensionConnection()) {
5351
await waitForBridgeConnection(4000);
5452
}
55-
const data = await callExtension("type_text", params);
5653

57-
const action = params.pressEnter ? `Typed "${params.text}" and pressed Enter` : `Typed "${params.text}"`;
58-
return captureAriaSnapshot(data.data.url, `${action} into "${params.selector}"`);
54+
await callExtension("type_text", { ...params, pressEnter: params.submit === true });
55+
const action = params.submit ? `Typed "${params.text}" and pressed Enter` : `Typed "${params.text}"`;
56+
const snapshot = await captureAriaSnapshot();
57+
return withActionText(`${action} into "${params.element}"`, snapshot);
5958
},
6059
};
6160

62-
/**
63-
* Hover over an element
64-
*/
65-
const HoverSchema = z.object({
66-
selector: z.string().describe("CSS selector for the element to hover over"),
67-
});
61+
const HoverSchema = ElementSchema;
6862

69-
export const hover: Tool = {
63+
export const browserHover: Tool = {
7064
schema: {
71-
name: "hover",
72-
description: "Hover the mouse over an element on the currently active tab to trigger hover effects, tooltips, or dropdowns. First use navigate_browser to load a page, then use this tool to interact with elements.",
65+
name: "browser_hover",
66+
description: "Hover over element on page",
7367
inputSchema: zodToJsonSchema(HoverSchema) as any,
7468
},
7569
handle: async (params) => {
7670
if (!hasExtensionConnection()) {
7771
await waitForBridgeConnection(4000);
7872
}
79-
const data = await callExtension("hover_element", params);
8073

81-
return captureAriaSnapshot(data.data.url, `Hovered over "${params.selector}"`);
74+
await callExtension("hover_element", params);
75+
const snapshot = await captureAriaSnapshot();
76+
return withActionText(`Hovered over "${params.element}"`, snapshot);
8277
},
8378
};
8479

85-
/**
86-
* Select an option from a dropdown
87-
*/
88-
const SelectOptionSchema = z.object({
89-
selector: z.string().describe("CSS selector for the select element"),
90-
value: z.string().describe("The option value or visible text to select"),
80+
const SelectOptionSchema = ElementSchema.extend({
81+
values: z.array(z.string()).min(1).describe("Array of values to select in the dropdown"),
9182
});
9283

93-
export const selectOption: Tool = {
84+
export const browserSelectOption: Tool = {
9485
schema: {
95-
name: "select_option",
96-
description: "Select an option from a dropdown/select element on the currently active tab by value or visible text. First use navigate_browser to load a page, then use this tool to interact with elements.",
86+
name: "browser_select_option",
87+
description: "Select an option in a dropdown",
9788
inputSchema: zodToJsonSchema(SelectOptionSchema) as any,
9889
},
9990
handle: async (params) => {
10091
if (!hasExtensionConnection()) {
10192
await waitForBridgeConnection(4000);
10293
}
103-
const data = await callExtension("select_option", params);
10494

105-
return captureAriaSnapshot(data.data.url, `Selected option "${params.value}" in "${params.selector}"`);
95+
await callExtension("select_option", params);
96+
const snapshot = await captureAriaSnapshot();
97+
return withActionText(`Selected option in "${params.element}"`, snapshot);
10698
},
10799
};
108100

109-
/**
110-
* Fill multiple form fields at once
111-
*/
112101
const FillFormFieldSchema = z.object({
113-
selector: z.string().describe("CSS selector for the form field"),
102+
selector: z
103+
.string()
104+
.optional()
105+
.describe("CSS selector for the form field (legacy fallback, prefer ref)"),
106+
ref: z.string().optional().describe("Element reference from browser_snapshot"),
114107
value: z.string().describe("Value to set (use 'true'/'false' for checkboxes)"),
115108
});
116109

117110
const FillFormSchema = z.object({
118111
fields: z.array(FillFormFieldSchema).min(1).describe("Array of fields to fill"),
119112
});
120113

121-
export const fillForm: Tool = {
114+
export const browserFillForm: Tool = {
122115
schema: {
123-
name: "fill_form",
124-
description: "Fill multiple form fields at once on the currently active tab. First use navigate_browser to load a page, then use this tool to interact with elements. Supports text inputs, selects, checkboxes, and radio buttons.",
116+
name: "browser_fill_form",
117+
description: "Fill multiple form fields (inputs, selects, checkboxes, radios) by selector/value.",
125118
inputSchema: zodToJsonSchema(FillFormSchema) as any,
126119
},
127120
handle: async (params) => {
128121
if (!hasExtensionConnection()) {
129122
await waitForBridgeConnection(4000);
130123
}
131-
const data = await callExtension("fill_form", params);
132124

133-
const fieldCount = data.data.successfulFields || 0;
134-
return captureAriaSnapshot(data.data.url, `Filled ${fieldCount} form fields`);
125+
const data = await callExtension("browser_fill_form", params);
126+
const fieldCount = data?.data?.successfulFields || params.fields.length;
127+
const snapshot = await captureAriaSnapshot();
128+
return withActionText(`Filled ${fieldCount} form fields`, snapshot);
135129
},
136130
};
137131

138-
/**
139-
* Execute custom JavaScript on the page
140-
*/
141-
const ExecuteScriptSchema = z.object({
142-
script: z.string().describe("The JavaScript code to execute. Should be a function body that returns a value."),
143-
args: z.array(z.union([
144-
z.string(),
145-
z.number(),
146-
z.boolean(),
147-
z.null(),
148-
z.record(z.unknown()),
149-
])).optional().describe("Optional array of arguments to pass to the script (supports strings, numbers, booleans, null, and objects)"),
132+
const PressKeySchema = z.object({
133+
key: z.string().describe("Name of the key to press or character to generate (e.g., 'Enter', 'ArrowLeft', 'a')"),
150134
});
151135

152-
export const executeScript: Tool = {
136+
export const browserPressKey: Tool = {
153137
schema: {
154-
name: "execute_script",
155-
description: "Execute custom JavaScript code on the currently active tab and return the result. First use navigate_browser to load a page, then use this tool to execute scripts. Use with caution.",
156-
inputSchema: zodToJsonSchema(ExecuteScriptSchema) as any,
138+
name: "browser_press_key",
139+
description: "Press a key on the keyboard",
140+
inputSchema: zodToJsonSchema(PressKeySchema) as any,
157141
},
158142
handle: async (params) => {
159143
if (!hasExtensionConnection()) {
160144
await waitForBridgeConnection(4000);
161145
}
162-
const data = await callExtension("execute_script", params);
163-
164-
return {
165-
content: [
166-
{
167-
type: "text",
168-
text: `Script executed successfully on ${data.data.url}. Result:\n\`\`\`json\n${JSON.stringify(data.data.result, null, 2)}\n\`\`\``,
169-
},
170-
],
171-
_meta: { urls: [data.data.url] },
172-
};
146+
147+
await callExtension("press_key", params);
148+
const snapshot = await captureAriaSnapshot();
149+
return withActionText(`Pressed key ${params.key}`, snapshot);
150+
},
151+
};
152+
153+
const DragSchema = z.object({
154+
startElement: z.string().describe("Human-readable source element description"),
155+
startRef: z.string().describe("Source element reference from browser_snapshot"),
156+
startSelector: z.string().optional().describe("Optional CSS selector fallback for the source element"),
157+
endElement: z.string().describe("Human-readable target element description"),
158+
endRef: z.string().describe("Target element reference from browser_snapshot"),
159+
endSelector: z.string().optional().describe("Optional CSS selector fallback for the target element"),
160+
});
161+
162+
export const browserDrag: Tool = {
163+
schema: {
164+
name: "browser_drag",
165+
description: "Perform drag and drop between two elements",
166+
inputSchema: zodToJsonSchema(DragSchema) as any,
167+
},
168+
handle: async (params) => {
169+
if (!hasExtensionConnection()) {
170+
await waitForBridgeConnection(4000);
171+
}
172+
173+
await callExtension("drag_element", params);
174+
const snapshot = await captureAriaSnapshot();
175+
return withActionText(`Dragged "${params.startElement}" to "${params.endElement}"`, snapshot);
173176
},
174177
};
178+
179+
function withActionText(action: string, snapshot: ToolResult): ToolResult {
180+
const existing = Array.isArray(snapshot.content) ? snapshot.content : [];
181+
return {
182+
...snapshot,
183+
content: [
184+
{
185+
type: "text",
186+
text: action,
187+
},
188+
...existing,
189+
],
190+
};
191+
}

0 commit comments

Comments
 (0)