TL;DR
(
controversial take to provoke discussion
)
You’re using agents wrong.
Deploy Jaeger, instrument your dev runtime with OpenTelemetry, and update your agent prompt so it adds OTEL instrumentation and debugs via telemetry (using otel-mcp: https://github.com/krackenservices/otel-mcp) instead of leaning on stdout.
If a human wouldn’t merge unobserved code, neither should an agent.
(I threw otel-mcp together quickly to let Antigravity debug code more effectively at home, so your milage may vary but it worked for me)
We should push agent builders and model builders to make “prefer telemetry over logs” a default debugging behaviour so we don’t have to keep prompting for it.
——
Over Christmas I was writing code with Antigravity (seriously… wow).
And thinking about one of my usual rants about mental maps as code flowed matrix style across my screen, it clicked with another gripe I had been thinking about: observability.
That led to a new idea – Observable Agentic Development
When code is written, modified, or debugged by AI agents, it should be observable in the same way we expect production systems to be observable.
To me, that means two principles:
- Agent-made changes must be validated using structured observability data — not unstructured logs or stdout.
- Agents need an observability surface they can query — including execution history that exists outside the chat/context the agent was operating in.
Principle 1: Telemetry over stdout
AI coding agents often:
- write code without executing it (or only much later)
- reason over intent, not behaviour
- operate asynchronously from runtime failures
Traditional debugging tends to rely on:
- stdout
- ad-hoc logging
- human intuition
- “linear” execution traces (which fall apart fast once async/concurrency enters the chat)
If agents write code, then humans and agents both need to verify it with evidence.
Agent-driven development should be observable by default, like production systems. An added bonus is that observability “follows the code” into production.
When an agent changes code, runtime behaviour is what answers the question: “Did the agent do what it was told?”
Why OpenTelemetry helps
OpenTelemetry provides:
- traces across services
- quantitative metrics
- structured, correlatable signals
- clear execution order, timing, and context via spans
Stdout gives you:
- lossy text
- weak/implicit causality
- no machine-readable structure
- confusing ordering under concurrency (race conditions can be invisible)
Telemetry replaces educated guesswork with evidence.
So: let the agent query traces/metrics directly (Jaeger, OTEL collectors, whatever) as it works. That means it can run tests, validate complex flows, and identify causal spans using data. (Data Driven Development???? another term, i am on fire)
The workflow I’m moving to
- AI agent makes a code change
- runtime emits OpenTelemetry signals
- agent queries telemetry via MCP
- agent finds the causal spans/metrics
- agent fixes code based on evidence
Stdout still exists and is useful — but telemetry gives a better feedback loop and scales as systems get distributed. It also helps with blast-radius debugging: if a change breaks another service via a contract mismatch, OTEL correlates it into a single causal story instead of “some other random break elsewhere”.
So my working defaults for ai assisted coding have become:
- instrument early
- treat telemetry as source of truth
- expose observability via MCP
Principle 2: Make agents observable too
This is about auditing and long-range debugging of agent behaviour:
- which prompts correlate with failures later in the dev tree?
- where did a constraint get introduced (“don’t do X”) that is now blocking progress?
- how do we reconstruct why an agent did what it did when the original context is gone?
Humans forget constraints. Agents forget context boundaries. You want traces/metadata to make that inspectable.