Orchard API Documentation
Orchard provides high-performance, ephemeral macOS environments for AI agents. Unlike traditional VDI or VMs, Orchard uses native macOS user session virtualization to achieve sub-second input latency and 3-second environment resets.
Architecture
Standard cloud providers run macOS on bare metal with a 24-hour minimum lease due to Apple's SLA. Orchard bypasses this by owning the hardware and virtualizing the User Session layer rather than the OS layer.
- Kernel: Shared Darwin Kernel (M4 Pro Silicon)
- Isolation: Standard UNIX user permissions (chmod/chown)
- Reset Strategy:
sysadminctluser deletion (3s avg)
CLI Installation
Use the Orchard CLI to manage your fleet of agents.
# Install via Homebrew (macOS/Linux)
brew install orchard-cli
# Authenticate
orchard login --key sk_live_8392...
# Spawn 5 agents in US East
orchard spawn --count 5 --region us-east-1
API Reference
Once an agent is spawned, you communicate with it directly via HTTP using the IP and Port returned by the CLI.
/stream
Returns a raw MJPEG (Motion JPEG) stream of the agent's desktop. The stream runs at 60fps and is optimized for computer vision models.
# Consume via Python OpenCV
import cv2
# Connect to the specific agent port (e.g., 8001)
cap = cv2.VideoCapture("http://localhost:8001/stream")
while True:
ret, frame = cap.read()
if ret:
# frame is a standard numpy array (BGR)
cv2.imshow('Agent View', frame)
if cv2.waitKey(1) & 0xFF == ord('q'): break
cap.release()
cv2.destroyAllWindows()
/mouse/click
Moves the mouse to specific coordinates and performs a click event. Coordinates are absolute (0,0 is top-left).
Parameters
x(int): Horizontal pixel positiony(int): Vertical pixel positiondouble(bool, optional): Perform double-click. Default: false
curl -X POST http://localhost:8001/mouse/click \
-H "Content-Type: application/json" \
-d '{
"x": 500,
"y": 200,
"double": false
}'
/keyboard/type
Types text into the currently focused element. Use /keyboard/press for special keys (Enter, Tab, Cmd).
curl -X POST http://localhost:8001/keyboard/type \
-H "Content-Type: application/json" \
-d '{ "text": "Hello World" }'
Integration Guides
OpenAI Gym (Gymnasium)
Orchard provides a drop-in gym.Env implementation suitable for Reinforcement Learning.
This environment handles the action translation and observation capture automatically.
import gymnasium as gym
import requests
import numpy as np
import cv2
from gymnasium import spaces
class OrchardEnv(gym.Env):
"""
Custom Environment that follows the OpenAI Gym interface.
Connects to a remote Orchard Agent.
"""
metadata = {'render_modes': ['rgb_array']}
def __init__(self, agent_url="http://localhost:8001"):
super(OrchardEnv, self).__init__()
self.agent_url = agent_url
# Action Space:
# 0: Move Mouse (x, y)
# 1: Left Click
# 2: Type Text (simplified to fixed discrete tokens for this example)
self.action_space = spaces.Dict({
"action_type": spaces.Discrete(2),
"x": spaces.Box(0, 1920, shape=(1,), dtype=int),
"y": spaces.Box(0, 1080, shape=(1,), dtype=int)
})
# Observation Space: The screen pixels (1080p RGB)
self.observation_space = spaces.Box(low=0, high=255, shape=(1080, 1920, 3), dtype=np.uint8)
def step(self, action):
# Execute Action via HTTP API
if action['action_type'] == 1:
requests.post(f"{self.agent_url}/mouse/click", json={"x": int(action['x']), "y": int(action['y'])})
else:
requests.post(f"{self.agent_url}/mouse/move", json={"x": int(action['x']), "y": int(action['y'])})
# Capture Observation (Get latest frame)
cap = cv2.VideoCapture(f"{self.agent_url}/stream")
ret, frame = cap.read()
cap.release()
if not ret:
frame = np.zeros((1080, 1920, 3), dtype=np.uint8)
# Calculate Reward (Placeholder - usually provided by your vision model)
reward = 0
terminated = False
truncated = False
return frame, reward, terminated, truncated, {}
def reset(self, seed=None, options=None):
super().reset(seed=seed)
# In a real implementation, you might call the CLI here to wipe the user
# subprocess.run(["orchard", "reset", "--agent", "agent_001"])
return np.zeros((1080, 1920, 3), dtype=np.uint8), {}
LangChain Integration
Use Orchard agents as tools within your LangChain graph.
The following example uses the @tool decorator to expose the Mac's capabilities to an LLM.
from langchain.tools import tool
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain_openai import ChatOpenAI
import requests
# Configuration
AGENT_URL = "http://localhost:8001"
LLM_MODEL = "gpt-4-turbo-preview"
@tool
def click_screen(x: int, y: int):
"""
Clicks the mouse at the specific x, y pixel coordinates on the screen.
Use this to click buttons, icons, or menu items.
"""
resp = requests.post(f"{AGENT_URL}/mouse/click", json={"x": x, "y": y})
return f"Clicked at {x}, {y}. Status: {resp.status_code}"
@tool
def type_keyboard(text: str):
"""
Types the specified string of text into the currently focused input field.
"""
resp = requests.post(f"{AGENT_URL}/keyboard/type", json={"text": text})
return f"Typed: '{text}'"
@tool
def press_key(key: str):
"""
Presses a special key. Supported keys: enter, tab, space, backspace, cmd.
"""
requests.post(f"{AGENT_URL}/keyboard/press", json={"key": key})
return f"Pressed {key}"
# Initialize the Agent
tools = [click_screen, type_keyboard, press_key]
llm = ChatOpenAI(model=LLM_MODEL, temperature=0)
# Construct the Agent
# The agent will now "Reason" about when to click vs type based on your prompt
# agent = create_openai_tools_agent(llm, tools, prompt)