Open Source App Automation for Claude

Test Any App Like Magic

The ghost hand that automates your apps on macOS, Windows, and Linux. Click buttons, type text, navigate UI elements, and capture screenshots with natural language through Claude Code or Claude Desktop.

brew install --cask geisterhand-io/tap/geisterhand
Terminal
$ geisterhand run Calculator
{"port":49152,"pid":12345,"app":"Calculator","host":"127.0.0.1"}

You: Click 7, then +, then 3, then =. What's the result?

Claude: The result is 10. I used the accessibility API to find and press buttons by their labels, then captured a screenshot to verify.

How It Works

Get started in three simple steps. No complex setup, no learning curve.

01

Install Geisterhand

One command on macOS or Linux, one download on Windows. Sets up everything you need.

brew install --cask geisterhand-io/tap/geisterhand
02

Grant Permissions

Launch the app once and grant Accessibility and Screen Recording permissions when prompted.

03

Run

Start automating any app. The server launches scoped to your target app and outputs connection details as JSON.

geisterhand run YourApp

Built for Developers

Everything you need to automate app testing, nothing you don't.

Zero Configuration

No test scripts to write. No selectors to maintain. Just describe what you want to test in plain English and Geisterhand figures out the rest.

Semantic UI Control

Find and interact with UI elements by role, title, or label using native accessibility APIs. More reliable than coordinate-based clicking.

Lightning Screenshots

Capture full-resolution screenshots of any display using native screen capture APIs. Perfect for visual regression testing and documentation.

Claude Code & Desktop

Works with both Claude Code (CLI) and Claude Desktop via MCP. One-click integration from the menu bar app.

HTTP API & CLI

Full REST API on localhost:7676 plus a command-line tool. Integrate with any language, framework, or automation pipeline.

Native Performance

Built with Swift (macOS), .NET (Windows), and Rust (Linux) for maximum performance and minimal resource usage. No Electron, no Node.js, no overhead.

Background Automation

Send input to apps running in the background using PID-targeted events. No need to bring apps to the foreground — perfect for non-intrusive testing.

Menu Control

Discover and trigger application menu items programmatically. Navigate menus without keyboard shortcuts, even in background apps.

See It In Action

Watch Geisterhand automate a complete testing workflow in under 2 minutes.

Demo Recording

$ geisterhand testing TextEdit...

Demo Coming Soon
2:00
Quick Demo
Real App
Real Desktop Testing
No Edits
Raw Recording

Quick Start

Get up and running in under a minute.

1

Install Geisterhand

Terminal
brew install --cask geisterhand-io/tap/geisterhand

This installs the menu bar app and the geisterhand CLI. Or download the DMG directly.

2

Grant Permissions

Launch the app once to trigger permission prompts, then grant in System Settings:

Accessibility
For keyboard and mouse control
Screen Recording
For screenshot capture
3

Start automating with geisterhand run

Launch a scoped automation server for any app:

Terminal
$ geisterhand run Calculator
{"port":49152,"pid":12345,"app":"Calculator","host":"127.0.0.1"}

The server auto-selects a free port, scopes all API requests to the target app, and exits when the app quits. Pass an app name, path, or identifier:

geisterhand run Safari
geisterhand run /Applications/Xcode.app
geisterhand run com.apple.TextEdit
geisterhand run Calculator --port 7676 # pin a specific port

Then send API requests to the host and port from the JSON output.

4

Try it out

With the server running, send requests directly:

# See what's on screen
curl http://127.0.0.1:49152/accessibility/tree?format=compact

# Click a button (use "push button" on Linux instead of "AXButton")
curl -X POST http://127.0.0.1:49152/click/element \
  -H "Content-Type: application/json" \
  -d '{"title": "7", "role": "AXButton"}'

# Take a screenshot
curl http://127.0.0.1:49152/screenshot --output screen.png

Or let an LLM drive it. Add the testing guide to your project's CLAUDE.md and ask Claude to test your app:

"Open Calculator, click 7, then +, then 3, then =. What's the result?"

Optional: MCP Integration for Claude

For a deeper integration, add Geisterhand as an MCP server so Claude can call the API natively:

Terminal
claude mcp add-json geisterhand \
  '{"type":"stdio","command":"npx","args":["geisterhand-mcp"]}' \
  --scope user

Requires Node.js 18+. Restart Claude after adding.

API Reference

Start with geisterhand run YourApp and send requests to the port from the JSON output. All requests use JSON with snake_case field names.

Endpoints

Method Endpoint Description Parameters
GET /status System info, permissions, frontmost app, screen size
GET /health Health check
GET /screenshot Capture screen as base64 PNG app?, windowId?, format?, display?
POST /click Click at screen coordinates x, y, button?, click_count?, modifiers?
POST /click/element Click element by semantic properties title?, role?, label?, pid?, use_accessibility_action?
POST /type Type text at cursor position text, delay_ms?, pid?, path?, role?, title?
POST /key Press key with modifiers key, modifiers?, pid?, path?
POST /scroll Scroll at position x?, y?, delta_x?, delta_y?, pid?, path?
POST /wait Wait for UI condition title?, role?, condition, timeout_ms?, poll_interval_ms?
GET /menu Get app menu structure app
POST /menu Trigger menu item app, path, background?
GET /accessibility/tree Get UI element hierarchy pid?, maxDepth?, format?, includeActions?
GET /accessibility/elements Find elements by criteria role?, title?, titleContains?, labelContains?, valueContains?, pid?, maxResults?
GET /accessibility/focused Get focused element pid?
POST /accessibility/action Perform action on element path, action, value?

Click element by title

POST /click/element
{
  "title": "Submit",
  "role": "AXButton",
  "pid": 12345
}

Keyboard shortcut (Cmd+S)

POST /key
{
  "key": "s",
  "modifiers": ["cmd"]
}

Find buttons by label

GET /accessibility/elements
  ?role=AXButton
  &labelContains=Submit

Returns elements with path, frame, and actions

Press a button by path

POST /accessibility/action
{
  "path": {
    "pid": 12345,
    "path": [0, 0, 2, 5]
  },
  "action": "press"
}

Special Keys

return, tab, space, escape, delete, up, down, left, right, home, end, pageup, pagedown, f1-f12

Modifiers

cmd, ctrl, alt, shift, fn, super

Common Roles

AXButton, AXTextField, AXTextArea, AXCheckBox, AXPopUpButton, AXMenuItem, AXStaticText, AXLink

Actions

press, setValue, focus, confirm, cancel, increment, decrement, showMenu, pick

Recommended Workflow

1

Launch

geisterhand run YourApp to start the scoped server

2

Inspect

GET /accessibility/tree?format=compact to see the UI

3

Interact

Click elements, type text, press keys, trigger menus

4

Verify

GET /screenshot or read element values to confirm

5

Wait

POST /wait for UI changes instead of sleep

Prefer /click/element over coordinate-based /click. Semantic selectors are more reliable than screen positions.

Linux note: AT-SPI2 uses different role names — push button, text, label instead of macOS AXButton, AXTextField, AXStaticText.

Use Cases

From quick smoke tests to comprehensive automation, Geisterhand adapts to your workflow.

QA Automation

Automate repetitive QA tasks without writing test scripts. Claude uses the Accessibility API to find and interact with UI elements by their semantic properties.

Test form validation Verify button states Fill and submit forms

Regression Testing

Catch UI regressions before your users do. Geisterhand captures high-resolution screenshots for visual comparison across builds.

Compare screenshots Detect layout shifts Verify theming

Workflow Automation

Automate multi-step workflows across applications. Open apps via Spotlight, navigate menus, fill forms, and save files to specific locations.

Cross-app workflows Data entry tasks Batch operations

Accessibility Audits

Explore your app's UI tree and verify accessibility labels. Find elements by role, check focus order, and test keyboard navigation.

Verify AX labels Test keyboard nav Check focus order

Ready to automate your app testing?