Turn prompts into metrics.
Measure how well your agent actually performs.

Wrap your LLM calls with Agent Plasticity. We break down prompts into measurable metrics and score every response against them.

agent.ts
const { text } = await generateText({
  model: openai("gpt-5.2"),
  system: "Be concise. Use markdown.
    Cite sources.",
  prompt: userQuery
})

// send to Agent Plasticity
plasticity.track({
  prompt, output: text, model
})
Behavioral Scores
Conciseness
Markdown
Citations
10864Jan 6Jan 10Jan 14Jan 18Jan 22
Conciseness
8.2+0.4
Markdown
9.6+0.1
Citations
6.1-1.3

How it works

FIG 0.1

Collect

Add one line to your agent. We collect every LLM call with zero latency impact.

FIG 0.2

Extract

We read your prompts and break them down into measurable metrics.

FIG 0.3

Score

Every output is scored on every metric. See trends, catch regressions, improve.

Every score maps back to your prompt

We score each metric against the instruction it came from. Not generic quality, your specific expectations.

7.8
Conciseness"Be extremely concise"

Second paragraph restates the introduction. Could be 40% shorter.

9.6
Markdown Compliance"Only answer in markdown"

Proper heading hierarchy, code blocks, and bullet lists throughout.

6.1
Source Citation"Cite your data sources"

Two of four claims cite sources. BMI threshold is uncited.

Your prompts already define what good looks like.

We turn that into something you can measure.

Get started free

Free to start. No credit card required.