Studio

The studio command launches a web-based dashboard for browsing evaluation runs, inspecting individual test results, and reviewing scores. It shows both local runs and runs synced from a remote results repository.

Usage

agentv studio

Studio auto-discovers run workspaces from .agentv/results/runs/ in the current directory and opens at http://localhost:3117.

You can also point it at a specific run workspace or index.jsonl manifest:

agentv studio .agentv/results/runs/2026-03-30T11-45-56-989Z/index.jsonl
# or
agentv studio .agentv/results/runs/2026-03-30T11-45-56-989Z

Options

Option	Description
`--port`, `-p`	Port to listen on (flag > `PORT` env var > 3117)
`--dir`, `-d`	Working directory (default: current directory)
`--multi`	Launch in multi-project dashboard mode (deprecated; use auto-detect or `--single`)
`--single`	Force single-project dashboard mode
`--add <path>`	Register a project by path
`--remove <id>`	Unregister a project by ID
`--discover <path>`	Scan a directory tree for repos with `.agentv/`

Features

Recent Runs — table of all evaluation runs with source badge (local / remote), target, experiment, timestamp, test count, pass rate, and mean score
Experiments — group and compare runs by experiment name
Targets — group runs by target (model/agent)
Run Detail — drill into a run to see per-test results, scores, and evaluator output
Human Review — add feedback annotations to individual test results
Comparison Matrix — experiment × target matrix showing pass rates across dimensions
Remote Results — sync and browse runs pushed from other machines or CI (see Remote Results)

Run Detail

Click any run to see a breakdown by suite, per-test scores, target, duration, and cost. The source label (local or remote) tells you where the run came from.

Experiments

The Experiments tab groups runs by experiment name so you can compare the impact of changes — for example, with_skills vs without_skills.

Comparison Matrix

The Compare tab shows a cross-model, cross-experiment performance matrix. Cells are color-coded by pass rate: green (80%+), yellow (50–80%), red (below 50%). The best performer per row has an emerald ring; the worst has a red ring. Click any cell to expand per-test-case results.

Run the same eval against multiple providers or experiment variants, then open the Compare tab:

agentv eval my.EVAL.yaml --target azure --experiment baseline
agentv eval my.EVAL.yaml --target azure --experiment with-caching
agentv eval my.EVAL.yaml --target gemini --experiment baseline
agentv eval my.EVAL.yaml --target gemini --experiment with-caching
agentv studio  # Compare tab shows 2x2 matrix

The matrix is available per-project under the Compare tab.

Benchmarks Dashboard

By default, Studio shows results for the current directory. Register multiple benchmark repos to view them from a single dashboard.

Registering Benchmarks

agentv studio --add /path/to/my-evals
agentv studio --add /path/to/other-evals

Each path must contain a .agentv/ directory. Registered benchmarks are stored in ~/.agentv/projects.yaml.

Auto-Discovery

Scan a parent directory to find and register all benchmark repos:

agentv studio --discover /path/to/repos

This recursively searches (up to 2 levels deep) for directories containing .agentv/ and registers them.

Launching the Dashboard

Studio auto-detects the mode based on how many benchmarks are registered:

0 or 1 registered: single-project view
2+ registered: Benchmarks dashboard

agentv studio          # auto-detects
agentv studio --single # force single-project view

The landing page shows a card for each benchmark with run count, pass rate, and last run time.

AgentV Studio benchmarks dashboard showing benchmark cards with pass rates

Removing a Benchmark

Unregister by its ID:

agentv studio --remove my-evals

IDs are derived from the directory name (e.g., /home/user/repos/my-evals becomes my-evals).

Remote Results

Studio can display runs pushed to a remote git repository by other machines or CI — alongside your local runs. Each run in the list carries a source badge: local (green) or remote (amber).

Configuration

Add a results.export block to .agentv/config.yaml:

results:
  export:
    repo: EntityProcess/agentv-evals   # GitHub repo (owner/repo or full URL)
    path: runs                          # Directory within the repo
    auto_push: true                     # Push automatically after every eval run
    branch_prefix: eval-results         # Branch naming prefix (default: eval-results)

With auto_push: true, every agentv eval run or agentv pipeline bench automatically creates a draft PR in the configured repo with a structured results table.

Authentication

Uses gh CLI and git credentials already configured on the machine. If authentication is missing, AgentV warns and skips the export — the eval run itself is never blocked.

Syncing in Studio

Once configured, Studio fetches remote runs on load. Use the Sync Remote Results button in the source toolbar to pull the latest. The toolbar also shows when results were last synced and the configured repo.

Use the All Sources / Local Only / Remote Only filter to narrow the run list by origin.