Agent runtime

Grid AI Agent Runtime

Operator-facing reference for the AI agent that generates Grid models from natural-language prompts. The agent's behavior contract — the rules it follows when emitting source — is in ai-agent-guide.md. This document covers the runtime: how to configure providers, what surfaces expose the agent, and what to expect operationally.

Surfaces

The same agent is exposed through three surfaces. They share the system prompt, tools, and structured output schema.

Surface	Entry point	Use case
Library	`import { createGridAgent } from "../../src/agent/index.js"`	Server-side codegen, custom workflows
CLI	`npm run agent -- "<prompt>"`	Local development, scripting, CI
HTTP — non-streaming	`POST /api/agent/generate`	Server-to-server
HTTP — streaming	`POST /api/agent/stream`	Real-time UI (used by the AgentPanel)
Frontend	`AgentPanel` component in the editor	Interactive authoring

Provider Configuration

The agent supports three providers. Selection precedence (highest first):

Explicit option (e.g. createGridAgent({ provider: "anthropic" }))
GRID_AGENT_PROVIDER env var
Default: gateway

Provider	Required env	Default model
`gateway`	`AI_GATEWAY_API_KEY`	`anthropic/claude-sonnet-4.5`
`anthropic`	`ANTHROPIC_API_KEY`	`claude-sonnet-4-5`
`openai`	`OPENAI_API_KEY`	`gpt-5.1`

Override the model id either via model: "..." option or the GRID_AGENT_MODEL env var.

Recommendation. Default to gateway. The Vercel AI Gateway gives you provider failover, cost tracking, and a single string-based model namespace (anthropic/..., openai/..., google/...).

Library Usage

import { createGridAgent, generateGrid } from "./src/agent/index.js";
 
// One-shot helper
const { output, steps } = await generateGrid(
  "build a portfolio model with EUR/USD FX conversion"
);
console.log(output.source);
console.log(`took ${steps} steps`);
 
// Or get the underlying ToolLoopAgent for full control
const agent = createGridAgent({ provider: "anthropic", maxSteps: 12 });
const result = await agent.generate({
  prompt: "alert when load > threshold",
  onStepFinish: async ({ stepNumber, toolCalls }) => {
    console.log(`step ${stepNumber}: ${toolCalls?.length ?? 0} tool calls`);
  }
});

The agent returns a structured object:

interface GridAgentOutput {
  source: string;            // the .grid model source
  explanation: string;       // 1-3 sentence summary
  runtime: "ts" | "lua_generated" | "lua_interpreter";
  validatedClean: boolean;   // true iff validate_grid passed on the final source
  warnings: string[];        // non-blocking notes
}

CLI Usage

# Simplest case — print the model to stdout
npm run agent -- "build a portfolio model with EUR/USD FX conversion"
 
# Write to a file with explicit provider
npm run agent -- --output portfolio.grid --provider gateway "build a portfolio..."
 
# Read prompt from a file
npm run agent -- --prompt-file ./brief.md
 
# Show step-by-step tool-call progress on stderr
npm run agent -- --show-steps "..."
 
# Skip post-generation validation
npm run agent -- --no-validate "..."
 
# Override the loop budget (default 8)
npm run agent -- --max-steps 12 "..."

Flags:

Flag	Default	Effect
`--prompt-file <path>`	(none)	Read prompt from a file instead of argv
`--output <path>`	(stdout)	Write source to a file
`--provider <name>`	`$GRID_AGENT_PROVIDER` or `gateway`	gateway / anthropic / openai
`--model <id>`	provider default	Override model id
`--max-steps <n>`	8	Tool-loop step cap
`--no-validate`	(validate on)	Skip post-generation validation
`--show-steps`	off	Stream step events to stderr

Exit codes:

Code	Meaning
0	Generated successfully (and validated, if enabled)
1	Argument or transport error
2	Generation succeeded but post-validation failed

HTTP API

`POST /api/agent/generate`

One-shot. Returns the structured output once the agent finishes.

POST /api/agent/generate
Content-Type: application/json
 
{
  "prompt": "build a portfolio model",
  "provider": "gateway",        // optional
  "model": "anthropic/claude-sonnet-4.5",  // optional
  "maxSteps": 8                   // optional
}

Response:

{
  "output": {
    "source": "MODEL ...\n...",
    "explanation": "...",
    "runtime": "lua_generated",
    "validatedClean": true,
    "warnings": []
  },
  "provider": "gateway",
  "model": "anthropic/claude-sonnet-4.5"
}

`POST /api/agent/stream`

Same input shape. Returns a UI Message Stream Response (the AI SDK v6 streaming format). Used by the frontend AgentPanel. Compatible with @ai-sdk/react useChat.

Both endpoints honor the existing bearer-token auth (GRID_API_TOKEN).

Frontend Panel

The AgentPanel React component (web/frontend/src/components/AgentPanel.tsx) provides:

A textarea for the prompt
Provider/model dropdowns
A "Generate" button (POSTs to /api/agent/generate)
A result preview with explanation and warnings
"Apply to editor" — pushes the generated source into the Monaco editor
"Copy" — copies the source to the clipboard

The panel sits above the Monaco editor in the left pane.

Tool Loop

Every agent invocation has access to seven tools:

Tool	Purpose
`lookup_function({ name })`	Full signature + description for one function
`search_functions({ query, category?, limit })`	Match by name OR description, optional category filter
`get_doc_section({ topic })`	Fetch a language doc section by name
`get_canonical_example({ name })`	Fetch one of the seven canonical examples
`explain_error({ code, excerptLength? })`	Cause + recovery + errors.md excerpt for an error code
`validate_grid({ source })`	Parse + build + Lua-generate; returns ok/diagnostics
`evaluate_grid({ source, inputs?, requestSymbols? })`	Deploy + run in-process + return every cell's resolved value

The agent typically follows this sequence:

Read the prompt.
(Optionally) get_canonical_example for a matching pattern.
(Optionally) lookup_function / search_functions for unfamiliar functions.
Draft the model.
validate_grid to check syntax and structure.
evaluate_grid to verify behavioral correctness (with sample inputs).
If diagnostics or unexpected error cells appear, fix and re-validate / re-evaluate.
Return structured output.

The loop is capped at 8 steps by default (stopWhen: stepCountIs(8)). Override with the maxSteps option or --max-steps flag.

evaluate_grid: behavioral verification

The biggest difference from a "generate-and-pray" agent: every model the Grid agent emits is run through the actual InMemoryGridRuntime before being returned. The agent sees:

Every cell's evaluated value (numbers, strings, dates, errors).
The status of every external call (queued/stale is expected with no worker attached; ready means it resolved with a fallback).
Per-cell error codes (#DIV/0!, #VALUE!, etc.).
A list of cells in error state for quick triage.

This means a successful generation has both validatedClean: true (syntactically + structurally valid) and evaluatedClean: true (all cells either resolved or pending external).

Operational Notes

Cost: each invocation makes 1-8 LLM calls (system prompt is ~16KB, plus tool results). Use the AI Gateway's cost dashboard to track spend per call.
Latency: typically 5-20 seconds end-to-end for small models; longer if the agent iterates on validation diagnostics.
Determinism: outputs are not deterministic. Run multiple times for difficult prompts and pick the cleanest validation.
Disk I/O: language docs are loaded once at module init. The agent's per-call overhead is just the LLM round-trips and the in-process validation.
Auth: the HTTP endpoints sit behind the same bearer-token auth as the rest of the API (GRID_API_TOKEN). The agent itself does not consume the bearer token; it uses the configured provider's credentials independently.

Failure Modes

Symptom	Likely cause	Fix
`Could not load credentials for provider gateway`	Missing `AI_GATEWAY_API_KEY`	Set the env var
`validatedClean: false`	Agent emitted source the parser rejected	Re-prompt with more constraints; check `output.warnings` for hints
`Agent stopped after 8 steps without finishing`	Prompt too complex for the loop budget	Increase `maxSteps` or break the request into pieces
Empty response or 400	Provider rejected the request (rate limit, model unavailable)	Retry; switch provider via `--provider`
`Unknown function FOO` in source	Agent hallucinated a function	Re-prompt naming the closest real function explicitly

Example Sessions

Quick: convert a notional amount

$ npm run agent -- "convert 100,000 USD to EUR using FX_RATE; show the result with 2 decimals"

Typical output:

MODEL "USD to EUR Conversion"
DESCRIPTION "Convert a notional amount from USD to EUR using the live FX rate."
RUNTIME "lua_generated"
VERSION "1.0.0"
AUTHOR "AI Agent"
 
amount_usd as currency = 100000
fx_usd_eur as fx_rate = FX_RATE("USD", "EUR") DEFAULT 0.93
 
amount_eur as currency = ROUND(amount_usd * fx_usd_eur, 2)
 
END MODEL

Reactive alerting

$ npm run agent -- --provider anthropic "alert when CPU > 80% for more than 15 minutes"

The agent will call get_canonical_example("05-rulebook-operations") to anchor the WHEN/EVERY pattern, then draft a TS-runtime model.

From a brief

$ cat > brief.md << 'EOF'
Build a treasury control plane that:
- Tracks five funding sources with haircuts
- Computes a coverage ratio against operating outflows
- Pages ops if coverage drops below 1.15
EOF
 
$ npm run agent -- --prompt-file brief.md --output treasury.grid --show-steps

Extending The Agent

The agent is intentionally narrow in v1. Common extensions:

Add a tool: Register a new tool in src/agent/tools.ts alongside the five existing ones, then add it to gridAgentTools.
Change the system prompt: Edit docs/language/ai-agent-guide.md — prompts.ts reads it at module init.
Multi-turn chat: The current API is one-shot. To add chat, build on top of agent.generate({ messages: [...] }) (the AI SDK v6 message format).
Different output schema: Replace gridAgentOutputSchema in agent.ts with a different Zod schema; the agent will conform.
Vector retrieval over a larger doc set: For docs beyond the v1 ten-file set, swap get_doc_section for a tool that does embedding search via embed/embedMany from the AI SDK.

Testing

Offline tests (always run)

Part of the regular npm test suite. Cover:

Tool execute paths (each of the seven tools)
System prompt assembly (compactness, required sections)
Output schema validation (Zod)
Provider resolution (env-var precedence, defaults)
CLI argument parsing
evaluate_grid against fabricated good and broken sources

Run: npm test -- --run tests/agent

Live tests (opt-in, costs real money)

Live integration tests live in tests/agent/live/ and use a separate vitest config (vitest.live.config.ts). They are NOT run by npm test to avoid burning API credits in CI.

# Run all live tests against whichever provider has an env key
npm run test:live
 
# Run a single file
npx vitest run --config vitest.live.config.ts tests/agent/live/foundations.test.ts
 
# Force a specific provider via env (overrides preferred-provider)
GRID_AGENT_PROVIDER=anthropic npm run test:live

File	Tests	Coverage
`foundations.test.ts`	6	Trivial sums, type tags, DEFAULT, input override, headers, output shape
`arrays.test.ts`	5	SUM/MAP/REDUCE/SCAN, AVERAGE/MIN/MAX
`text-and-logic.test.ts`	6	Concat, interpolation, regex, MATCH, CASE WHEN, THEN/ELSE
`external-and-rules.test.ts`	8	FX_RATE+DEFAULT, HTTP_JSON+WITH, ML_SCORE lazy, no-external-in-rule, WHEN/EVERY/AT
`recovery.test.ts`	6	Multi-line MATCH avoidance, runtime selection, CASE WHEN, error literal restrictions, full-column avoidance, hallucination guard
`providers.test.ts`	3	Per-provider smoke (gateway / anthropic / openai)

Every live test:

Generates a model via runLive(prompt)
Independently re-validates the source via validateGridSource
Independently re-evaluates via evaluateGridSource
Asserts on actual cell values from the in-process runtime

So tests don't trust the agent's self-reported validatedClean — they verify against the real parser + runtime.

Setup for live tests

Set one or more of:

Env var	Provider
`AI_GATEWAY_API_KEY`	Vercel AI Gateway (recommended)
`ANTHROPIC_API_KEY`	Direct Anthropic
`OPENAI_API_KEY`	Direct OpenAI

Tests skip themselves when no key is configured for the chosen provider, so a partial keyset still runs the available subset.

Optional model overrides (use the latest model id for the provider):

export GRID_AGENT_LIVE_GATEWAY_MODEL="anthropic/claude-sonnet-4.5"
export GRID_AGENT_LIVE_ANTHROPIC_MODEL="claude-sonnet-4-5"
export GRID_AGENT_LIVE_OPENAI_MODEL="gpt-5.1"

Expected cost: ~$1.50 for the full suite (~34 tests). Each test runs sequentially (fileParallelism: false) to avoid rate limits.