A GUI Agent application based on UI-TARS(Vision-Language Model) that allows you to control your computer using natural language.
A fast, lightweight Model Context Protocol (MCP) server that empowers LLMs with browser automation via Puppeteer’s structured accessibility data, featuring optional vision mode for complex visual understanding and flexible, cross-platform configuration.
First, install the Browser MCP server with your client. A typical configuration looks like this:
{
"mcpServers": {
"browser": {
"command": "npx",
"args": [
"@agent-infra/mcp-server-browser@latest"
]
}
}
}
You can also install the Browser MCP server using the VS Code CLI:
## For VS Code
code --add-mcp '{"name":"browser","command":"npx","args":["@agent-infra/mcp-server-browser@latest"]}'
After installation, the Browser MCP server will be available for use with your GitHub Copilot agent in VS Code.
Go to Cursor Settings -> MCP -> Add new MCP Server. Name to your liking, use command type with the command npx @agent-infra/mcp-server-browser. You can also verify config or add command like arguments via clicking Edit.
{
"mcpServers": {
"browser": {
"command": "npx",
"args": [
"@agent-infra/mcp-server-browserp@latest"
]
}
}
}
Follow Windsuff MCP documentation. Use following configuration:
{
"mcpServers": {
"browser": {
"command": "npx",
"args": [
"@agent-infra/mcp-server-browser@latest"
]
}
}
}
Follow the MCP install guide, use following configuration:
{
"mcpServers": {
"browser": {
"command": "npx",
"args": [
"@agent-infra/mcp-server-browser@latest"
]
}
}
}
At the same time, use --port $your_port arg to start the browser mcp can be converted into SSE and Streamable HTTP Server.
## normal run remote mcp server
npx @agent-infra/mcp-server-browser --port 8089
## run with DISPLAY environment for VNC or other virtual display
DISPLAY=:0 npx @agent-infra/mcp-server-browser --port 8089
You can use one of the two MCP Server remote endpoint:
http://127.0.0.1::8089/mcphttp://127.0.0.1::8089/sseAnd then in MCP client config, set the url to the SSE endpoint:
{
"mcpServers": {
"browser": {
"url": "http://127.0.0.1::8089/sse"
}
}
}
url to the Streamable HTTP:
{
"mcpServers": {
"browser": {
"type": "streamable-http", // If there is MCP Client support
"url": "http://127.0.0.1::8089/mcp"
}
}
}
If your MCP Client is developed based on JavaScript / TypeScript, you can directly use in-process calls to avoid requiring your users to install the command-line interface to use Browser MCP.
import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { InMemoryTransport } from '@modelcontextprotocol/sdk/inMemory.js';
// type: module project usage
import { createServer } from '@agent-infra/mcp-server-browser';
// commonjs project usage
// const { createServer } = await import('@agent-infra/mcp-server-browser')
const client = new Client(
{
name: 'test browser client',
version: '1.0',
},
{
capabilities: {},
},
);
const server = createServer();
const [clientTransport, serverTransport] = InMemoryTransport.createLinkedPair();
await Promise.all([
client.connect(clientTransport),
server.connect(serverTransport),
]);
// list tools
const result = await client.listTools();
console.log(result);
// call tool
const toolResult = await client.callTool({
name: 'browser_navigate',
arguments: {
url: 'https://www.google.com',
},
});
console.log(toolResult);
Browser MCP server supports following arguments. They can be provided in the JSON configuration above, as a part of the "args" list:
> npx @agent-infra/mcp-server-browser@latest -h
-V, --version output the version number
--browser <browser> browser or chrome channel to use, possible values: chrome, edge, firefox.
--cdp-endpoint <endpoint> CDP endpoint to connect to, for example "http://127.0.0.1:9222/json/version"
--ws-endpoint <endpoint> WebSocket endpoint to connect to, for example "ws://127.0.0.1:9222/devtools/browser/{id}"
--executable-path <path> path to the browser executable.
--headless run browser in headless mode, headed by default
--host <host> host to bind server to. Default is localhost. Use 0.0.0.0 to bind to all interfaces.
--port <port> port to listen on for SSE and HTTP transport.
--proxy-bypass <bypass> comma-separated domains to bypass proxy, for example ".com,chromium.org,.domain.com"
--proxy-server <proxy> specify proxy server, for example "http://myproxy:3128" or "socks5://myproxy:8080"
--user-agent <ua string> specify user agent string
--user-data-dir <path> path to the user data directory.
--viewport-size <size> specify browser viewport size in pixels, for example "1280, 720"
--vision Run server that uses screenshots (Aria snapshots are used by default)
-h, --help display help for command
The browser runtime requires configuration for Viewport Size, Vision Model Coordinate Factors, and User Agent. These can be passed through corresponding HTTP headers:
| Header | Description |
|---|---|
x-viewport-size |
Browser viewport size, format: width,height separated by comma |
x-vision-factors |
Vision model coordinate system factors, format: x_factor,y_factor separated by comma |
x-user-agent |
User Agent string, defaults to system User Agent if not specified |
Note: Header names are case-insensitive.
Example:
x-viewport-size: 1920,1080
x-vision-factors: 1.0,1.0
x-user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36
We have unified the deployment of VNC and MCP under a single URL endpoint, The Dockerfile and DockerHub image will be published together!
Access http://127.0.0.1:6274/:
npm run dev
A Model Context Protocol (MCP) server that provides real-time cryptocurrency analysis via CoinCap's API. Enables Claude and other MCP clients to fetch crypto prices, analyze market trends, and track historical data.
DevEnvInfoServer - Cursor MCP Server for Development Environment Information
A Python-based MCP (Model Context Protocol) server that predicts the origin