Skip to content

Scenario harness

ripple run is a headless batch harness for running agents against a fixed set of inputs and checking their outputs against expected signatures. It is designed for regression testing, capability benchmarking, and automated evaluation pipelines - situations where you need reproducible, deterministic runs without an interactive session.


Basic usage

ripple run <scenarios> [--out <dir>]

<scenarios> is either:

  • A single .json scenario file, or
  • A directory of .json files (all are run in sequence).

--out sets the output directory (default: deepagent-runs/latest/).

# Run a single scenario
ripple run scenarios/summarize-code.json

# Run all scenarios in a directory
ripple run scenarios/ --out runs/2026-06-23/

# Run with a specific model
ripple run scenarios/ --model LiquidAI/LFM2.5-1.2B-Instruct-MLX-bf16

Model download

If the scenario's model is not in the local Hugging Face cache, ripple run prints a hint and bails. Pass --yes (--download) to auto-download without prompting.


Scenario file format

Scenario files are JSON. In practice, they are authored as TOML and converted to JSON by a wrapper script - TOML is easier to write for multiline strings and nested structures. The JSON schema is the source of truth.

Complete schema

{
  "id": "scenario-name",
  "agent": {
    "model": "LiquidAI/LFM2.5-1.2B-Instruct-MLX-bf16",
    "system_prompt": "registry-key-or-inline-text",
    "middleware": ["screenshot", "clipboard"],
    "tools": ["calculator"],
    "include_filesystem": false,
    "include_general_purpose": false,
    "max_iterations": 24,
    "backend": "memory",
    "approvals": "auto-approve",
    "subagents": [
      {
        "name": "researcher",
        "description": "Searches for and summarizes information",
        "model": "LiquidAI/LFM2.5-1.2B-Instruct-MLX-bf16",
        "system_prompt": "You are a research assistant...",
        "tools": ["web_search"],
        "middleware": [],
        "max_iterations": 12
      }
    ]
  },
  "prompts": {
    "turns": [
      "First user message",
      "Second user message"
    ]
  },
  "fixtures": {
    "clipboard": "seed text for the clipboard",
    "windows": [
      { "name": "App - Document Title", "png": "/abs/path/to/window.png" }
    ],
    "screen": "/abs/path/to/fullscreen.png"
  },
  "expect": {
    "signature_name": true
  }
}

Field reference

Top level

Field Type Description
id string Unique identifier for the scenario. Used in trace filenames and the manifest.
agent object Agent configuration (see below).
prompts object The conversation to replay.
fixtures object Deterministic input seeds (optional).
expect object Expected output signatures to check (optional).

agent

Field Type Description
model string Hugging Face model id or registered remote model name.
system_prompt string A registry key (looked up from the prompt registry) or inline system prompt text.
middleware string[] Capability middleware to enable (e.g. "screenshot", "clipboard").
tools string[] Named tools to add to the agent.
include_filesystem bool Whether to include filesystem tools (read_file, write_file, etc.).
include_general_purpose bool Whether to include general-purpose tools.
max_iterations int Maximum agent loop iterations before the run is terminated.
backend "memory" | "local" Message store: memory (in-process only) or local (persisted to disk).
approvals "auto-approve" | "auto-reject" Approval policy for all tool calls. auto-approve lets the agent run tools without prompting; auto-reject blocks every tool call.
subagents object[] Subagent definitions the planner can delegate to (see below).

agent.subagents[]

Field Type Description
name string Identifier used when the planner delegates to this subagent.
description string Natural-language description of what this subagent does.
model string Optional model override. Inherits the planner model if omitted.
system_prompt string System prompt for this subagent.
tools string[] Tools available to this subagent.
middleware string[] Middleware enabled for this subagent.
max_iterations int Maximum iterations for this subagent's loop.

prompts

Field Type Description
turns string[] Ordered list of user messages. Each string is delivered as a separate turn, in sequence.

fixtures

Fixtures seed inputs that would otherwise require live system state. They make runs deterministic without needing Screen Recording permission or a live application.

Field Type Description
clipboard string Text pre-loaded into the clipboard before the run.
windows object[] Fake window screenshots: { "name": "App - Title", "png": "/abs/path/window.png" }.
screen string Absolute path to a full-screen PNG used instead of a live screenshot.

expect

A flat object of signature name to boolean. After the run, Ripple checks whether each named signature was observed in the trace and records pass/fail in manifest.json.

"expect": {
  "tool_called_write_file": true,
  "response_contains_summary": true
}

Output

For each run, Ripple writes to the output directory (default deepagent-runs/latest/):

File Contents
<id>.jsonl Full agent trace for the scenario: every message, tool call, and tool result.
manifest.json Summary: one entry per scenario with observed (what actually happened) vs expected (from the expect field), plus pass/fail per signature.
deepagent-runs/latest/
  summarize-code.jsonl
  extract-tables.jsonl
  manifest.json

Inspect a trace to debug unexpected behavior:

# Pretty-print the first scenario's trace
cat deepagent-runs/latest/summarize-code.jsonl | jq .

Authoring in TOML

Because JSON is verbose for multiline strings and nested configs, scenario files are commonly authored as TOML and converted to JSON:

id = "summarize-code"

[agent]
model = "LiquidAI/LFM2.5-1.2B-Instruct-MLX-bf16"
system_prompt = """
You are a code reviewer. Summarize the key changes in the provided diff.
"""
include_filesystem = false
include_general_purpose = false
max_iterations = 8
backend = "memory"
approvals = "auto-approve"

[prompts]
turns = [
  "Summarize this diff: ...",
]

[expect]
response_contains_summary = true

Convert with any TOML-to-JSON tool before passing to ripple run:

python3 -c "import tomllib, json, sys; print(json.dumps(tomllib.loads(sys.stdin.read()), indent=2))" \
  < scenarios/summarize-code.toml > scenarios/summarize-code.json

Memory management

Ripple calls MLX.Memory.clearCache() between scenario runs to release GPU memory. This keeps multi-scenario batches from accumulating metal allocations and improves stability on long runs.


Complete example

A scenario that tests whether the agent correctly writes a summary file given a code snippet:

{
  "id": "write-summary",
  "agent": {
    "model": "LiquidAI/LFM2.5-1.2B-Instruct-MLX-bf16",
    "system_prompt": "You are a concise technical writer. When given code, write a one-paragraph summary to summary.md.",
    "include_filesystem": true,
    "include_general_purpose": false,
    "max_iterations": 6,
    "backend": "memory",
    "approvals": "auto-approve"
  },
  "prompts": {
    "turns": [
      "Write a summary of this function:\n\nfunc add(_ a: Int, _ b: Int) -> Int { a + b }"
    ]
  },
  "expect": {
    "tool_called_write_file": true
  }
}

Run it:

ripple run scenarios/write-summary.json --out runs/test-1/
cat runs/test-1/manifest.json

Expected manifest output:

[
  {
    "id": "write-summary",
    "observed": ["tool_called_write_file"],
    "expected": { "tool_called_write_file": true },
    "passed": true
  }
]

See also