Agent runtime

Grid AI Agent Runtime

Grid AI Agent Runtime

Operator-facing reference for the AI agent that generates Grid models from natural-language prompts. The agent's behavior contract — the rules it follows when emitting source — is in ai-agent-guide.md. This document covers the runtime: how to configure providers, what surfaces expose the agent, and what to expect operationally.


Surfaces

The same agent is exposed through three surfaces. They share the system prompt, tools, and structured output schema.

Surface Entry point Use case
Library import { createGridAgent } from "../../src/agent/index.js" Server-side codegen, custom workflows
CLI npm run agent -- "<prompt>" Local development, scripting, CI
HTTP — non-streaming POST /api/agent/generate Server-to-server
HTTP — streaming POST /api/agent/stream Real-time UI (used by the AgentPanel)
Frontend AgentPanel component in the editor Interactive authoring

Provider Configuration

The agent supports three providers. Selection precedence (highest first):

  1. Explicit option (e.g. createGridAgent({ provider: "anthropic" }))
  2. GRID_AGENT_PROVIDER env var
  3. Default: gateway
Provider Required env Default model
gateway AI_GATEWAY_API_KEY anthropic/claude-sonnet-4.5
anthropic ANTHROPIC_API_KEY claude-sonnet-4-5
openai OPENAI_API_KEY gpt-5.1

Override the model id either via model: "..." option or the GRID_AGENT_MODEL env var.

Recommendation. Default to gateway. The Vercel AI Gateway gives you provider failover, cost tracking, and a single string-based model namespace (anthropic/..., openai/..., google/...).


Library Usage

import { createGridAgent, generateGrid } from "./src/agent/index.js";
 
// One-shot helper
const { output, steps } = await generateGrid(
  "build a portfolio model with EUR/USD FX conversion"
);
console.log(output.source);
console.log(`took ${steps} steps`);
 
// Or get the underlying ToolLoopAgent for full control
const agent = createGridAgent({ provider: "anthropic", maxSteps: 12 });
const result = await agent.generate({
  prompt: "alert when load > threshold",
  onStepFinish: async ({ stepNumber, toolCalls }) => {
    console.log(`step ${stepNumber}: ${toolCalls?.length ?? 0} tool calls`);
  }
});

The agent returns a structured object:

interface GridAgentOutput {
  source: string;            // the .grid model source
  explanation: string;       // 1-3 sentence summary
  runtime: "ts" | "lua_generated" | "lua_interpreter";
  validatedClean: boolean;   // true iff validate_grid passed on the final source
  warnings: string[];        // non-blocking notes
}

CLI Usage

# Simplest case — print the model to stdout
npm run agent -- "build a portfolio model with EUR/USD FX conversion"
 
# Write to a file with explicit provider
npm run agent -- --output portfolio.grid --provider gateway "build a portfolio..."
 
# Read prompt from a file
npm run agent -- --prompt-file ./brief.md
 
# Show step-by-step tool-call progress on stderr
npm run agent -- --show-steps "..."
 
# Skip post-generation validation
npm run agent -- --no-validate "..."
 
# Override the loop budget (default 8)
npm run agent -- --max-steps 12 "..."

Flags:

Flag Default Effect
--prompt-file <path> (none) Read prompt from a file instead of argv
--output <path> (stdout) Write source to a file
--provider <name> $GRID_AGENT_PROVIDER or gateway gateway / anthropic / openai
--model <id> provider default Override model id
--max-steps <n> 8 Tool-loop step cap
--no-validate (validate on) Skip post-generation validation
--show-steps off Stream step events to stderr

Exit codes:

Code Meaning
0 Generated successfully (and validated, if enabled)
1 Argument or transport error
2 Generation succeeded but post-validation failed

HTTP API

POST /api/agent/generate

One-shot. Returns the structured output once the agent finishes.

POST /api/agent/generate
Content-Type: application/json
 
{
  "prompt": "build a portfolio model",
  "provider": "gateway",        // optional
  "model": "anthropic/claude-sonnet-4.5",  // optional
  "maxSteps": 8                   // optional
}

Response:

{
  "output": {
    "source": "MODEL ...\n...",
    "explanation": "...",
    "runtime": "lua_generated",
    "validatedClean": true,
    "warnings": []
  },
  "provider": "gateway",
  "model": "anthropic/claude-sonnet-4.5"
}

POST /api/agent/stream

Same input shape. Returns a UI Message Stream Response (the AI SDK v6 streaming format). Used by the frontend AgentPanel. Compatible with @ai-sdk/react useChat.

Both endpoints honor the existing bearer-token auth (GRID_API_TOKEN).


Frontend Panel

The AgentPanel React component (web/frontend/src/components/AgentPanel.tsx) provides:

  • A textarea for the prompt
  • Provider/model dropdowns
  • A "Generate" button (POSTs to /api/agent/generate)
  • A result preview with explanation and warnings
  • "Apply to editor" — pushes the generated source into the Monaco editor
  • "Copy" — copies the source to the clipboard

The panel sits above the Monaco editor in the left pane.


Tool Loop

Every agent invocation has access to seven tools:

Tool Purpose
lookup_function({ name }) Full signature + description for one function
search_functions({ query, category?, limit }) Match by name OR description, optional category filter
get_doc_section({ topic }) Fetch a language doc section by name
get_canonical_example({ name }) Fetch one of the seven canonical examples
explain_error({ code, excerptLength? }) Cause + recovery + errors.md excerpt for an error code
validate_grid({ source }) Parse + build + Lua-generate; returns ok/diagnostics
evaluate_grid({ source, inputs?, requestSymbols? }) Deploy + run in-process + return every cell's resolved value

The agent typically follows this sequence:

  1. Read the prompt.
  2. (Optionally) get_canonical_example for a matching pattern.
  3. (Optionally) lookup_function / search_functions for unfamiliar functions.
  4. Draft the model.
  5. validate_grid to check syntax and structure.
  6. evaluate_grid to verify behavioral correctness (with sample inputs).
  7. If diagnostics or unexpected error cells appear, fix and re-validate / re-evaluate.
  8. Return structured output.

The loop is capped at 8 steps by default (stopWhen: stepCountIs(8)). Override with the maxSteps option or --max-steps flag.

evaluate_grid: behavioral verification

The biggest difference from a "generate-and-pray" agent: every model the Grid agent emits is run through the actual InMemoryGridRuntime before being returned. The agent sees:

  • Every cell's evaluated value (numbers, strings, dates, errors).
  • The status of every external call (queued/stale is expected with no worker attached; ready means it resolved with a fallback).
  • Per-cell error codes (#DIV/0!, #VALUE!, etc.).
  • A list of cells in error state for quick triage.

This means a successful generation has both validatedClean: true (syntactically + structurally valid) and evaluatedClean: true (all cells either resolved or pending external).


Operational Notes

  • Cost: each invocation makes 1-8 LLM calls (system prompt is ~16KB, plus tool results). Use the AI Gateway's cost dashboard to track spend per call.
  • Latency: typically 5-20 seconds end-to-end for small models; longer if the agent iterates on validation diagnostics.
  • Determinism: outputs are not deterministic. Run multiple times for difficult prompts and pick the cleanest validation.
  • Disk I/O: language docs are loaded once at module init. The agent's per-call overhead is just the LLM round-trips and the in-process validation.
  • Auth: the HTTP endpoints sit behind the same bearer-token auth as the rest of the API (GRID_API_TOKEN). The agent itself does not consume the bearer token; it uses the configured provider's credentials independently.

Failure Modes

Symptom Likely cause Fix
Could not load credentials for provider gateway Missing AI_GATEWAY_API_KEY Set the env var
validatedClean: false Agent emitted source the parser rejected Re-prompt with more constraints; check output.warnings for hints
Agent stopped after 8 steps without finishing Prompt too complex for the loop budget Increase maxSteps or break the request into pieces
Empty response or 400 Provider rejected the request (rate limit, model unavailable) Retry; switch provider via --provider
Unknown function FOO in source Agent hallucinated a function Re-prompt naming the closest real function explicitly

Example Sessions

Quick: convert a notional amount

$ npm run agent -- "convert 100,000 USD to EUR using FX_RATE; show the result with 2 decimals"

Typical output:

MODEL "USD to EUR Conversion"
DESCRIPTION "Convert a notional amount from USD to EUR using the live FX rate."
RUNTIME "lua_generated"
VERSION "1.0.0"
AUTHOR "AI Agent"
 
amount_usd as currency = 100000
fx_usd_eur as fx_rate = FX_RATE("USD", "EUR") DEFAULT 0.93
 
amount_eur as currency = ROUND(amount_usd * fx_usd_eur, 2)
 
END MODEL

Reactive alerting

$ npm run agent -- --provider anthropic "alert when CPU > 80% for more than 15 minutes"

The agent will call get_canonical_example("05-rulebook-operations") to anchor the WHEN/EVERY pattern, then draft a TS-runtime model.

From a brief

$ cat > brief.md << 'EOF'
Build a treasury control plane that:
- Tracks five funding sources with haircuts
- Computes a coverage ratio against operating outflows
- Pages ops if coverage drops below 1.15
EOF
 
$ npm run agent -- --prompt-file brief.md --output treasury.grid --show-steps

Extending The Agent

The agent is intentionally narrow in v1. Common extensions:

  • Add a tool: Register a new tool in src/agent/tools.ts alongside the five existing ones, then add it to gridAgentTools.
  • Change the system prompt: Edit docs/language/ai-agent-guide.mdprompts.ts reads it at module init.
  • Multi-turn chat: The current API is one-shot. To add chat, build on top of agent.generate({ messages: [...] }) (the AI SDK v6 message format).
  • Different output schema: Replace gridAgentOutputSchema in agent.ts with a different Zod schema; the agent will conform.
  • Vector retrieval over a larger doc set: For docs beyond the v1 ten-file set, swap get_doc_section for a tool that does embedding search via embed/embedMany from the AI SDK.

Testing

Offline tests (always run)

Part of the regular npm test suite. Cover:

  • Tool execute paths (each of the seven tools)
  • System prompt assembly (compactness, required sections)
  • Output schema validation (Zod)
  • Provider resolution (env-var precedence, defaults)
  • CLI argument parsing
  • evaluate_grid against fabricated good and broken sources

Run: npm test -- --run tests/agent

Live tests (opt-in, costs real money)

Live integration tests live in tests/agent/live/ and use a separate vitest config (vitest.live.config.ts). They are NOT run by npm test to avoid burning API credits in CI.

# Run all live tests against whichever provider has an env key
npm run test:live
 
# Run a single file
npx vitest run --config vitest.live.config.ts tests/agent/live/foundations.test.ts
 
# Force a specific provider via env (overrides preferred-provider)
GRID_AGENT_PROVIDER=anthropic npm run test:live
File Tests Coverage
foundations.test.ts 6 Trivial sums, type tags, DEFAULT, input override, headers, output shape
arrays.test.ts 5 SUM/MAP/REDUCE/SCAN, AVERAGE/MIN/MAX
text-and-logic.test.ts 6 Concat, interpolation, regex, MATCH, CASE WHEN, THEN/ELSE
external-and-rules.test.ts 8 FX_RATE+DEFAULT, HTTP_JSON+WITH, ML_SCORE lazy, no-external-in-rule, WHEN/EVERY/AT
recovery.test.ts 6 Multi-line MATCH avoidance, runtime selection, CASE WHEN, error literal restrictions, full-column avoidance, hallucination guard
providers.test.ts 3 Per-provider smoke (gateway / anthropic / openai)

Every live test:

  1. Generates a model via runLive(prompt)
  2. Independently re-validates the source via validateGridSource
  3. Independently re-evaluates via evaluateGridSource
  4. Asserts on actual cell values from the in-process runtime

So tests don't trust the agent's self-reported validatedClean — they verify against the real parser + runtime.

Setup for live tests

Set one or more of:

Env var Provider
AI_GATEWAY_API_KEY Vercel AI Gateway (recommended)
ANTHROPIC_API_KEY Direct Anthropic
OPENAI_API_KEY Direct OpenAI

Tests skip themselves when no key is configured for the chosen provider, so a partial keyset still runs the available subset.

Optional model overrides (use the latest model id for the provider):

export GRID_AGENT_LIVE_GATEWAY_MODEL="anthropic/claude-sonnet-4.5"
export GRID_AGENT_LIVE_ANTHROPIC_MODEL="claude-sonnet-4-5"
export GRID_AGENT_LIVE_OPENAI_MODEL="gpt-5.1"

Expected cost: ~$1.50 for the full suite (~34 tests). Each test runs sequentially (fileParallelism: false) to avoid rate limits.


See Also

  • ai-agent-guide.md — the contract the agent follows when emitting source
  • functions.md — the catalog lookup_function consults
  • grammar.json — machine-readable grammar profile
  • docs/api/http_api.md — full HTTP API
  • src/agent/ — implementation
  • tests/agent/ — offline tests
  • tests/agent/live/ — live integration tests (gated behind npm run test:live)