Observable Agentic Development / Data Driven Development

5 January 20265 January 2026 ryanm

TL;DR
( controversial take to provoke discussion )
You’re using agents wrong.
Deploy Jaeger, instrument your dev runtime with OpenTelemetry, and update your agent prompt so it adds OTEL instrumentation and debugs via telemetry (using otel-mcp: https://github.com/krackenservices/otel-mcp) instead of leaning on stdout.
If a human wouldn’t merge unobserved code, neither should an agent.

(I threw otel-mcp together quickly to let Antigravity debug code more effectively at home, so your milage may vary but it worked for me)

We should push agent builders and model builders to make “prefer telemetry over logs” a default debugging behaviour so we don’t have to keep prompting for it.

——

Over Christmas I was writing code with Antigravity (seriously… wow).
And thinking about one of my usual rants about mental maps as code flowed matrix style across my screen, it clicked with another gripe I had been thinking about: observability.

That led to a new idea – Observable Agentic Development

When code is written, modified, or debugged by AI agents, it should be observable in the same way we expect production systems to be observable.
To me, that means two principles:

Agent-made changes must be validated using structured observability data — not unstructured logs or stdout.
Agents need an observability surface they can query — including execution history that exists outside the chat/context the agent was operating in.

Principle 1: Telemetry over stdout

AI coding agents often:

write code without executing it (or only much later)
reason over intent, not behaviour
operate asynchronously from runtime failures

Traditional debugging tends to rely on:

stdout
ad-hoc logging
human intuition
“linear” execution traces (which fall apart fast once async/concurrency enters the chat)

If agents write code, then humans and agents both need to verify it with evidence.

Agent-driven development should be observable by default, like production systems. An added bonus is that observability “follows the code” into production.

When an agent changes code, runtime behaviour is what answers the question: “Did the agent do what it was told?”

Why OpenTelemetry helps

OpenTelemetry provides:

traces across services
quantitative metrics
structured, correlatable signals
clear execution order, timing, and context via spans

Stdout gives you:

lossy text
weak/implicit causality
no machine-readable structure
confusing ordering under concurrency (race conditions can be invisible)

Telemetry replaces educated guesswork with evidence.

So: let the agent query traces/metrics directly (Jaeger, OTEL collectors, whatever) as it works. That means it can run tests, validate complex flows, and identify causal spans using data. (Data Driven Development???? another term, i am on fire)

The workflow I’m moving to

AI agent makes a code change
runtime emits OpenTelemetry signals
agent queries telemetry via MCP
agent finds the causal spans/metrics
agent fixes code based on evidence

Stdout still exists and is useful — but telemetry gives a better feedback loop and scales as systems get distributed. It also helps with blast-radius debugging: if a change breaks another service via a contract mismatch, OTEL correlates it into a single causal story instead of “some other random break elsewhere”.

So my working defaults for ai assisted coding have become:

instrument early
treat telemetry as source of truth
expose observability via MCP

Principle 2: Make agents observable too

This is about auditing and long-range debugging of agent behaviour:

which prompts correlate with failures later in the dev tree?
where did a constraint get introduced (“don’t do X”) that is now blocking progress?
how do we reconstruct why an agent did what it did when the original context is gone?

Humans forget constraints. Agents forget context boundaries. You want traces/metadata to make that inspectable.

Observable Agentic Development / Data Driven Development

Leave a Reply Cancel reply