Orchard API Documentation

Orchard provides high-performance, ephemeral macOS environments for AI agents. Unlike traditional VDI or VMs, Orchard uses native macOS user session virtualization to achieve sub-second input latency and 3-second environment resets.

Note: Orchard is currently in Alpha. API keys are issued manually. Join the waitlist to get access.

Architecture

Standard cloud providers run macOS on bare metal with a 24-hour minimum lease due to Apple's SLA. Orchard bypasses this by owning the hardware and virtualizing the User Session layer rather than the OS layer.

Kernel: Shared Darwin Kernel (M4 Pro Silicon)
Isolation: Standard UNIX user permissions (chmod/chown)
Reset Strategy: sysadminctl user deletion (3s avg)

CLI Installation

Use the Orchard CLI to manage your fleet of agents.

# Install via Homebrew (macOS/Linux)
brew install orchard-cli

# Authenticate
orchard login --key sk_live_8392...

# Spawn 5 agents in US East
orchard spawn --count 5 --region us-east-1

GET /stream

Returns a raw MJPEG (Motion JPEG) stream of the agent's desktop. The stream runs at 60fps and is optimized for computer vision models.

# Consume via Python OpenCV
import cv2

# Connect to the specific agent port (e.g., 8001)
cap = cv2.VideoCapture("http://localhost:8001/stream")

while True:
    ret, frame = cap.read()
    if ret:
        # frame is a standard numpy array (BGR)
        cv2.imshow('Agent View', frame)
        if cv2.waitKey(1) & 0xFF == ord('q'): break

cap.release()
cv2.destroyAllWindows()

POST /mouse/click

Moves the mouse to specific coordinates and performs a click event. Coordinates are absolute (0,0 is top-left).

Parameters

x (int): Horizontal pixel position
y (int): Vertical pixel position
double (bool, optional): Perform double-click. Default: false

curl -X POST http://localhost:8001/mouse/click \
  -H "Content-Type: application/json" \
  -d '{
    "x": 500,
    "y": 200,
    "double": false
  }'

POST /keyboard/type

Types text into the currently focused element. Use /keyboard/press for special keys (Enter, Tab, Cmd).

curl -X POST http://localhost:8001/keyboard/type \
  -H "Content-Type: application/json" \
  -d '{ "text": "Hello World" }'

OpenAI Gym (Gymnasium)

Orchard provides a drop-in gym.Env implementation suitable for Reinforcement Learning. This environment handles the action translation and observation capture automatically.

import gymnasium as gym
import requests
import numpy as np
import cv2
from gymnasium import spaces

class OrchardEnv(gym.Env):
    """
    Custom Environment that follows the OpenAI Gym interface.
    Connects to a remote Orchard Agent.
    """
    metadata = {'render_modes': ['rgb_array']}

    def __init__(self, agent_url="http://localhost:8001"):
        super(OrchardEnv, self).__init__()
        self.agent_url = agent_url
        
        # Action Space: 
        # 0: Move Mouse (x, y)
        # 1: Left Click
        # 2: Type Text (simplified to fixed discrete tokens for this example)
        self.action_space = spaces.Dict({
            "action_type": spaces.Discrete(2), 
            "x": spaces.Box(0, 1920, shape=(1,), dtype=int),
            "y": spaces.Box(0, 1080, shape=(1,), dtype=int)
        })
        
        # Observation Space: The screen pixels (1080p RGB)
        self.observation_space = spaces.Box(low=0, high=255, shape=(1080, 1920, 3), dtype=np.uint8)

    def step(self, action):
        # Execute Action via HTTP API
        if action['action_type'] == 1:
            requests.post(f"{self.agent_url}/mouse/click", json={"x": int(action['x']), "y": int(action['y'])})
        else:
            requests.post(f"{self.agent_url}/mouse/move", json={"x": int(action['x']), "y": int(action['y'])})
        
        # Capture Observation (Get latest frame)
        cap = cv2.VideoCapture(f"{self.agent_url}/stream")
        ret, frame = cap.read()
        cap.release()
        
        if not ret:
            frame = np.zeros((1080, 1920, 3), dtype=np.uint8)

        # Calculate Reward (Placeholder - usually provided by your vision model)
        reward = 0 
        terminated = False
        truncated = False
        
        return frame, reward, terminated, truncated, {}

    def reset(self, seed=None, options=None):
        super().reset(seed=seed)
        # In a real implementation, you might call the CLI here to wipe the user
        # subprocess.run(["orchard", "reset", "--agent", "agent_001"])
        return np.zeros((1080, 1920, 3), dtype=np.uint8), {}

LangChain Integration

Use Orchard agents as tools within your LangChain graph. The following example uses the @tool decorator to expose the Mac's capabilities to an LLM.

from langchain.tools import tool
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain_openai import ChatOpenAI
import requests

# Configuration
AGENT_URL = "http://localhost:8001"
LLM_MODEL = "gpt-4-turbo-preview"

@tool
def click_screen(x: int, y: int):
    """
    Clicks the mouse at the specific x, y pixel coordinates on the screen.
    Use this to click buttons, icons, or menu items.
    """
    resp = requests.post(f"{AGENT_URL}/mouse/click", json={"x": x, "y": y})
    return f"Clicked at {x}, {y}. Status: {resp.status_code}"

@tool
def type_keyboard(text: str):
    """
    Types the specified string of text into the currently focused input field.
    """
    resp = requests.post(f"{AGENT_URL}/keyboard/type", json={"text": text})
    return f"Typed: '{text}'"

@tool
def press_key(key: str):
    """
    Presses a special key. Supported keys: enter, tab, space, backspace, cmd.
    """
    requests.post(f"{AGENT_URL}/keyboard/press", json={"key": key})
    return f"Pressed {key}"

# Initialize the Agent
tools = [click_screen, type_keyboard, press_key]
llm = ChatOpenAI(model=LLM_MODEL, temperature=0)

# Construct the Agent
# The agent will now "Reason" about when to click vs type based on your prompt
# agent = create_openai_tools_agent(llm, tools, prompt)