Grid AI Agent Runtime
Grid AI Agent Runtime
Operator-facing reference for the AI agent that generates Grid models
from natural-language prompts. The agent's behavior contract — the
rules it follows when emitting source — is in
ai-agent-guide.md. This document covers the
runtime: how to configure providers, what surfaces expose the agent,
and what to expect operationally.
Surfaces
The same agent is exposed through three surfaces. They share the system prompt, tools, and structured output schema.
| Surface | Entry point | Use case |
|---|---|---|
| Library | import { createGridAgent } from "../../src/agent/index.js" |
Server-side codegen, custom workflows |
| CLI | npm run agent -- "<prompt>" |
Local development, scripting, CI |
| HTTP — non-streaming | POST /api/agent/generate |
Server-to-server |
| HTTP — streaming | POST /api/agent/stream |
Real-time UI (used by the AgentPanel) |
| Frontend | AgentPanel component in the editor |
Interactive authoring |
Provider Configuration
The agent supports three providers. Selection precedence (highest first):
- Explicit option (e.g.
createGridAgent({ provider: "anthropic" })) GRID_AGENT_PROVIDERenv var- Default:
gateway
| Provider | Required env | Default model |
|---|---|---|
gateway |
AI_GATEWAY_API_KEY |
anthropic/claude-sonnet-4.5 |
anthropic |
ANTHROPIC_API_KEY |
claude-sonnet-4-5 |
openai |
OPENAI_API_KEY |
gpt-5.1 |
Override the model id either via model: "..." option or the
GRID_AGENT_MODEL env var.
Recommendation. Default to
gateway. The Vercel AI Gateway gives you provider failover, cost tracking, and a single string-based model namespace (anthropic/...,openai/...,google/...).
Library Usage
import { createGridAgent, generateGrid } from "./src/agent/index.js";
// One-shot helper
const { output, steps } = await generateGrid(
"build a portfolio model with EUR/USD FX conversion"
);
console.log(output.source);
console.log(`took ${steps} steps`);
// Or get the underlying ToolLoopAgent for full control
const agent = createGridAgent({ provider: "anthropic", maxSteps: 12 });
const result = await agent.generate({
prompt: "alert when load > threshold",
onStepFinish: async ({ stepNumber, toolCalls }) => {
console.log(`step ${stepNumber}: ${toolCalls?.length ?? 0} tool calls`);
}
});The agent returns a structured object:
interface GridAgentOutput {
source: string; // the .grid model source
explanation: string; // 1-3 sentence summary
runtime: "ts" | "lua_generated" | "lua_interpreter";
validatedClean: boolean; // true iff validate_grid passed on the final source
warnings: string[]; // non-blocking notes
}CLI Usage
# Simplest case — print the model to stdout
npm run agent -- "build a portfolio model with EUR/USD FX conversion"
# Write to a file with explicit provider
npm run agent -- --output portfolio.grid --provider gateway "build a portfolio..."
# Read prompt from a file
npm run agent -- --prompt-file ./brief.md
# Show step-by-step tool-call progress on stderr
npm run agent -- --show-steps "..."
# Skip post-generation validation
npm run agent -- --no-validate "..."
# Override the loop budget (default 8)
npm run agent -- --max-steps 12 "..."Flags:
| Flag | Default | Effect |
|---|---|---|
--prompt-file <path> |
(none) | Read prompt from a file instead of argv |
--output <path> |
(stdout) | Write source to a file |
--provider <name> |
$GRID_AGENT_PROVIDER or gateway |
gateway / anthropic / openai |
--model <id> |
provider default | Override model id |
--max-steps <n> |
8 | Tool-loop step cap |
--no-validate |
(validate on) | Skip post-generation validation |
--show-steps |
off | Stream step events to stderr |
Exit codes:
| Code | Meaning |
|---|---|
| 0 | Generated successfully (and validated, if enabled) |
| 1 | Argument or transport error |
| 2 | Generation succeeded but post-validation failed |
HTTP API
POST /api/agent/generate
One-shot. Returns the structured output once the agent finishes.
POST /api/agent/generate
Content-Type: application/json
{
"prompt": "build a portfolio model",
"provider": "gateway", // optional
"model": "anthropic/claude-sonnet-4.5", // optional
"maxSteps": 8 // optional
}Response:
{
"output": {
"source": "MODEL ...\n...",
"explanation": "...",
"runtime": "lua_generated",
"validatedClean": true,
"warnings": []
},
"provider": "gateway",
"model": "anthropic/claude-sonnet-4.5"
}POST /api/agent/stream
Same input shape. Returns a UI Message Stream Response (the AI SDK v6
streaming format). Used by the frontend AgentPanel. Compatible with
@ai-sdk/react useChat.
Both endpoints honor the existing bearer-token auth (GRID_API_TOKEN).
Frontend Panel
The AgentPanel React component (web/frontend/src/components/AgentPanel.tsx)
provides:
- A textarea for the prompt
- Provider/model dropdowns
- A "Generate" button (POSTs to
/api/agent/generate) - A result preview with explanation and warnings
- "Apply to editor" — pushes the generated source into the Monaco editor
- "Copy" — copies the source to the clipboard
The panel sits above the Monaco editor in the left pane.
Tool Loop
Every agent invocation has access to seven tools:
| Tool | Purpose |
|---|---|
lookup_function({ name }) |
Full signature + description for one function |
search_functions({ query, category?, limit }) |
Match by name OR description, optional category filter |
get_doc_section({ topic }) |
Fetch a language doc section by name |
get_canonical_example({ name }) |
Fetch one of the seven canonical examples |
explain_error({ code, excerptLength? }) |
Cause + recovery + errors.md excerpt for an error code |
validate_grid({ source }) |
Parse + build + Lua-generate; returns ok/diagnostics |
evaluate_grid({ source, inputs?, requestSymbols? }) |
Deploy + run in-process + return every cell's resolved value |
The agent typically follows this sequence:
- Read the prompt.
- (Optionally)
get_canonical_examplefor a matching pattern. - (Optionally)
lookup_function/search_functionsfor unfamiliar functions. - Draft the model.
validate_gridto check syntax and structure.evaluate_gridto verify behavioral correctness (with sample inputs).- If diagnostics or unexpected error cells appear, fix and re-validate / re-evaluate.
- Return structured output.
The loop is capped at 8 steps by default (stopWhen: stepCountIs(8)).
Override with the maxSteps option or --max-steps flag.
evaluate_grid: behavioral verification
The biggest difference from a "generate-and-pray" agent: every model the Grid agent emits is run through the actual InMemoryGridRuntime before being returned. The agent sees:
- Every cell's evaluated value (numbers, strings, dates, errors).
- The status of every external call (
queued/staleis expected with no worker attached;readymeans it resolved with a fallback). - Per-cell error codes (
#DIV/0!,#VALUE!, etc.). - A list of cells in error state for quick triage.
This means a successful generation has both validatedClean: true
(syntactically + structurally valid) and evaluatedClean: true
(all cells either resolved or pending external).
Operational Notes
- Cost: each invocation makes 1-8 LLM calls (system prompt is ~16KB, plus tool results). Use the AI Gateway's cost dashboard to track spend per call.
- Latency: typically 5-20 seconds end-to-end for small models; longer if the agent iterates on validation diagnostics.
- Determinism: outputs are not deterministic. Run multiple times for difficult prompts and pick the cleanest validation.
- Disk I/O: language docs are loaded once at module init. The agent's per-call overhead is just the LLM round-trips and the in-process validation.
- Auth: the HTTP endpoints sit behind the same bearer-token auth
as the rest of the API (
GRID_API_TOKEN). The agent itself does not consume the bearer token; it uses the configured provider's credentials independently.
Failure Modes
| Symptom | Likely cause | Fix |
|---|---|---|
Could not load credentials for provider gateway |
Missing AI_GATEWAY_API_KEY |
Set the env var |
validatedClean: false |
Agent emitted source the parser rejected | Re-prompt with more constraints; check output.warnings for hints |
Agent stopped after 8 steps without finishing |
Prompt too complex for the loop budget | Increase maxSteps or break the request into pieces |
| Empty response or 400 | Provider rejected the request (rate limit, model unavailable) | Retry; switch provider via --provider |
Unknown function FOO in source |
Agent hallucinated a function | Re-prompt naming the closest real function explicitly |
Example Sessions
Quick: convert a notional amount
$ npm run agent -- "convert 100,000 USD to EUR using FX_RATE; show the result with 2 decimals"Typical output:
MODEL "USD to EUR Conversion"
DESCRIPTION "Convert a notional amount from USD to EUR using the live FX rate."
RUNTIME "lua_generated"
VERSION "1.0.0"
AUTHOR "AI Agent"
amount_usd as currency = 100000
fx_usd_eur as fx_rate = FX_RATE("USD", "EUR") DEFAULT 0.93
amount_eur as currency = ROUND(amount_usd * fx_usd_eur, 2)
END MODELReactive alerting
$ npm run agent -- --provider anthropic "alert when CPU > 80% for more than 15 minutes"The agent will call get_canonical_example("05-rulebook-operations")
to anchor the WHEN/EVERY pattern, then draft a TS-runtime model.
From a brief
$ cat > brief.md << 'EOF'
Build a treasury control plane that:
- Tracks five funding sources with haircuts
- Computes a coverage ratio against operating outflows
- Pages ops if coverage drops below 1.15
EOF
$ npm run agent -- --prompt-file brief.md --output treasury.grid --show-stepsExtending The Agent
The agent is intentionally narrow in v1. Common extensions:
- Add a tool: Register a new tool in
src/agent/tools.tsalongside the five existing ones, then add it togridAgentTools. - Change the system prompt: Edit
docs/language/ai-agent-guide.md—prompts.tsreads it at module init. - Multi-turn chat: The current API is one-shot. To add chat,
build on top of
agent.generate({ messages: [...] })(the AI SDK v6 message format). - Different output schema: Replace
gridAgentOutputSchemainagent.tswith a different Zod schema; the agent will conform. - Vector retrieval over a larger doc set: For docs beyond the v1
ten-file set, swap
get_doc_sectionfor a tool that does embedding search viaembed/embedManyfrom the AI SDK.
Testing
Offline tests (always run)
Part of the regular npm test suite. Cover:
- Tool execute paths (each of the seven tools)
- System prompt assembly (compactness, required sections)
- Output schema validation (Zod)
- Provider resolution (env-var precedence, defaults)
- CLI argument parsing
evaluate_gridagainst fabricated good and broken sources
Run: npm test -- --run tests/agent
Live tests (opt-in, costs real money)
Live integration tests live in tests/agent/live/ and use a
separate vitest config (vitest.live.config.ts). They are NOT
run by npm test to avoid burning API credits in CI.
# Run all live tests against whichever provider has an env key
npm run test:live
# Run a single file
npx vitest run --config vitest.live.config.ts tests/agent/live/foundations.test.ts
# Force a specific provider via env (overrides preferred-provider)
GRID_AGENT_PROVIDER=anthropic npm run test:live| File | Tests | Coverage |
|---|---|---|
foundations.test.ts |
6 | Trivial sums, type tags, DEFAULT, input override, headers, output shape |
arrays.test.ts |
5 | SUM/MAP/REDUCE/SCAN, AVERAGE/MIN/MAX |
text-and-logic.test.ts |
6 | Concat, interpolation, regex, MATCH, CASE WHEN, THEN/ELSE |
external-and-rules.test.ts |
8 | FX_RATE+DEFAULT, HTTP_JSON+WITH, ML_SCORE lazy, no-external-in-rule, WHEN/EVERY/AT |
recovery.test.ts |
6 | Multi-line MATCH avoidance, runtime selection, CASE WHEN, error literal restrictions, full-column avoidance, hallucination guard |
providers.test.ts |
3 | Per-provider smoke (gateway / anthropic / openai) |
Every live test:
- Generates a model via
runLive(prompt) - Independently re-validates the source via
validateGridSource - Independently re-evaluates via
evaluateGridSource - Asserts on actual cell values from the in-process runtime
So tests don't trust the agent's self-reported validatedClean —
they verify against the real parser + runtime.
Setup for live tests
Set one or more of:
| Env var | Provider |
|---|---|
AI_GATEWAY_API_KEY |
Vercel AI Gateway (recommended) |
ANTHROPIC_API_KEY |
Direct Anthropic |
OPENAI_API_KEY |
Direct OpenAI |
Tests skip themselves when no key is configured for the chosen provider, so a partial keyset still runs the available subset.
Optional model overrides (use the latest model id for the provider):
export GRID_AGENT_LIVE_GATEWAY_MODEL="anthropic/claude-sonnet-4.5"
export GRID_AGENT_LIVE_ANTHROPIC_MODEL="claude-sonnet-4-5"
export GRID_AGENT_LIVE_OPENAI_MODEL="gpt-5.1"Expected cost: ~$1.50 for the full suite (~34 tests). Each test
runs sequentially (fileParallelism: false) to avoid rate limits.
See Also
ai-agent-guide.md— the contract the agent follows when emitting sourcefunctions.md— the cataloglookup_functionconsultsgrammar.json— machine-readable grammar profiledocs/api/http_api.md— full HTTP APIsrc/agent/— implementationtests/agent/— offline teststests/agent/live/— live integration tests (gated behindnpm run test:live)