Skip to content

Writing Tests

Test files are TOML files with a .test.toml extension. Each file has a name and a list of steps.

name = "Login flow"
[[steps]]
instruction = "Navigate to /login and verify the page loads"
[[steps]]
instruction = "Enter valid credentials and submit the form"
[[steps]]
instruction = "Verify you are redirected to the dashboard"

The name field identifies the test in console output and reports. Each [[steps]] entry is one step sent to the AI agent.

Each step should describe what to verify, not how. The agent figures out the implementation.

Good:

[[steps]]
instruction = "Make a GET request to /api/items and verify it returns a JSON array with 3 items"

Too vague:

[[steps]]
instruction = "Check the API"

Too prescriptive:

[[steps]]
instruction = "Run curl -s http://localhost:3000/api/items | jq length and check it equals 3"

Tips:

  • Be specific about expected values (“3 items”, “status 200”, “contains ‘Welcome’”)
  • One assertion per step keeps results clear
  • Steps run in order within a single agent session — state carries forward

Every step must produce exactly one result. The agent emits a marker at the end of its work:

RESULT OK
RESULT WARN: Items returned but order was unexpected
RESULT ERROR: Expected 3 items but got 0
  • OK — the step passed
  • WARN — something was off but not a hard failure
  • ERROR — the step failed

Bugatti parses the last RESULT marker in the agent’s output. If no marker is found (or the step times out), it’s treated as a protocol error and the step fails.

The agent can emit structured log lines during execution:

BUGATTI_LOG Created test user with id=42
BUGATTI_LOG Screenshot saved to /tmp/login-page.png

These appear in the console output and are captured in the run artifacts, separate from the full transcript.

Override the default timeout for a specific step:

[[steps]]
instruction = "Run the full migration suite"
step_timeout_secs = 600

The default is set in bugatti.config.toml (or 300 seconds if not configured).

Each step must have exactly one of:

FieldPurpose
instructionPlain-English instruction sent to the agent
include_pathPath to another test file to inline (see Includes)
include_globGlob pattern to inline multiple test files (see Includes)

Optional fields on any step:

FieldPurpose
step_timeout_secsPer-step timeout override (seconds)
skipIf true, step is skipped (see Skipping)
checkpointCheckpoint name (see Checkpoints)