How to Add LLM Tracing to Your Existing App Without a Rewrite

The Pain of Wanting Tracing Without Wanting a Rewrite

You built an LLM-powered application that works. Maybe it is a chatbot handling customer support tickets, a document summarizer plugged into your internal tools, or a multi-step agent that researches and drafts reports. The pipeline is stable. Users rely on it daily. Your team understands the code. And now you want to understand what is actually happening inside it at the model level.

That desire is completely reasonable. You want to see the prompts going out, the completions coming back, the token counts adding up, and the latency at each step. You want to know when the model refuses a request, when a tool call fails silently, or when a chain-of-thought step takes three times longer than it should. You want observability, the same kind of observability that backend engineers have enjoyed for years with distributed tracing in microservices. The difference is that your "service" is an LLM, and the inputs and outputs are natural language rather than structured data.

But here is where the friction starts. You search for "LLM tracing" and find solutions that require you to restructure your application around their framework. They want you to replace your OpenAI client with their custom client. They want you to rewrite your prompt chains using their proprietary abstractions. They want you to deploy a proxy server, configure OpenTelemetry collectors, or migrate to a specific orchestration library before you can see a single trace. Each of these approaches assumes you are willing to make significant changes to a system that already works.

For a greenfield project, that might be acceptable. For an existing application that is already in production, serving real users, and built on patterns your team understands, it is not. The cost of restructuring a working LLM pipeline just to add observability is disproportionate to the value. You do not want a new architecture. You want a window into the one you already have. You want to add span-level tracing to your existing LLM pipeline without a rewrite, and you want it done by the end of the day, not the end of the quarter.

This post walks through exactly how to do that. No framework migration, no proxy deployment, no restructuring of your business logic. The approach centers on SDK wrapping, a technique that instruments your existing API client transparently so every call is traced without changing how your application works. By the end, you will understand why most solutions demand rewrites, how SDK wrapping avoids that entirely, and how to get full span-level tracing running in under five minutes.

Why Most Tracing Solutions Require a Rewrite

The tracing ecosystem for LLM applications inherited many patterns from traditional distributed systems observability, but those patterns translate poorly to the way most developers build with language models. Understanding why most solutions demand significant code changes helps explain why a simpler approach exists and why it matters for teams that cannot afford weeks of integration work.

OpenTelemetry Setup Overhead

OpenTelemetry is the standard for distributed tracing, and it is excellent for microservices. But applying it to LLM pipelines means configuring collectors, exporters, span processors, and context propagation. You need to define custom span attributes for prompt content, token usage, and model parameters. You need to decide on a backend (Jaeger, Zipkin, a commercial vendor) and configure the exporter accordingly. The setup alone can take days, and maintaining the instrumentation as your pipeline evolves adds ongoing overhead. When a new LLM call is added to the codebase, someone has to remember to instrument it properly. When a model parameter changes, someone has to update the span attributes. For teams that just want to see what their LLM is doing, this level of infrastructure investment is overkill.

Framework Lock-In

Many LLM observability platforms are tightly coupled to specific orchestration frameworks. They work beautifully if you built your app with LangChain, LlamaIndex, or their own SDK. If you wrote plain OpenAI or Anthropic API calls (as many production applications do), you are out of luck unless you refactor. Some platforms offer "bring your own framework" support, but it usually means writing custom callbacks, implementing specific interfaces, or wrapping every call site manually. This creates an uncomfortable choice: adopt a framework you do not need just to get tracing, or go without visibility entirely. Neither option respects the investment you already made in your existing codebase.

Proxy-Based Approaches

Some solutions route your API calls through a proxy server that logs requests and responses. On paper, this sounds simple: change the base URL, and you get tracing. In practice, it adds a network hop to every LLM call, introducing latency that compounds in multi-step pipelines. It also means managing another piece of infrastructure, handling authentication forwarding, dealing with failure modes when the proxy goes down, and ensuring the proxy scales with your traffic. For streaming responses, proxy-based tracing gets especially complicated because the proxy needs to buffer and reassemble chunks while forwarding them in real time. If the proxy fails mid-stream, you lose both the trace and potentially the response itself.

Manual Instrumentation

The most basic approach is to manually add logging and tracing calls around every LLM interaction in your codebase. This works, technically, but it is tedious, error-prone, and creates maintenance burden. Every time you add a new LLM call, you need to remember to instrument it. Every time you change a prompt, you need to update the tracing context. Manual instrumentation also tends to drift: the logging format in one part of the codebase diverges from another, making it hard to build consistent dashboards or alerts. It is the most flexible option but also the most fragile one, and it scales poorly as the number of LLM interactions in your application grows.

The One-Line Approach: SDK Wrapping

SDK wrapping takes a fundamentally different approach. Instead of asking you to change how you call the LLM, it changes what happens inside the client you are already using. The technique works by creating a transparent proxy around your existing OpenAI, Anthropic, or other provider client. Every method on the original client still works exactly as before. The same parameters, the same return types, the same error handling. But behind the scenes, each call now generates a trace span with full context.

Think of it like wrapping a gift. The object inside does not change. The box around it simply adds a layer that captures information about what is happening. When you call the chat completions method on a wrapped client, the wrapper records the prompt, starts a timing span, forwards the call to the real API, captures the response (including token counts, finish reason, and model metadata), and closes the span. Your application code never knows the difference.

This is why the approach requires zero business logic changes. Your prompt templates stay the same. Your error handling stays the same. Your retry logic, your streaming handlers, your response parsing: all untouched. The only line that changes is the one where you initialize your client. Instead of using the client directly, you wrap it first.

This is exactly the approach that Glassbrain uses with its wrapOpenAI function in JavaScript and wrap_openai in Python. For Anthropic users, the corresponding functions are wrap_anthropic in Python and the equivalent wrapper in JavaScript. One function call wraps your existing client, and every subsequent API call through that client is automatically traced with full span-level detail.

The beauty of this approach is that it is composable. You do not need to wrap every client in your application at once. You can start with a single client in a single module, verify the traces look correct, and then gradually wrap additional clients as confidence grows. There is no big-bang migration, no flag day, and no risk of breaking something in a part of the codebase you did not intend to touch.

What You Get Without Changing Your Code

Once the wrapper is in place, your application produces rich trace data on every LLM interaction. Here is what that includes, without any additional code changes beyond the initial wrap call.

Full Prompt Capture

Every message sent to the model is captured in the trace, including system prompts, user messages, assistant responses used as context, and any injected few-shot examples. You can see the exact payload that reached the API, which is invaluable for debugging when a model behaves unexpectedly. If you are experimenting with different system prompt strategies across endpoints, the trace shows you precisely which prompt variant was active for each request.

Response and Token Tracking

Each trace span records the full model response, the number of prompt tokens, completion tokens, and total tokens consumed. Over time, this gives you a clear picture of cost per request, cost per feature, and cost trends. You can identify which parts of your application consume the most tokens and target optimization efforts accordingly. When token costs spike after a prompt change, the trace history makes the cause immediately visible.

Parent-Child Span Trees

For multi-step pipelines, wrapping captures the hierarchical relationship between calls. You see a visual trace tree showing which calls are children of which parent operations, how long each step took, and where bottlenecks live. This is especially powerful for agent workflows where one LLM call decides what to do next, then triggers a chain of tool calls and follow-up completions. Without the tree view, debugging these flows means reading log lines in sequence and mentally reconstructing the call graph. The visual trace tree in Glassbrain eliminates that guesswork entirely.

Tool Call Instrumentation

When models make tool calls (function calling), the wrapper captures the tool name, the arguments the model generated, and the result returned. This is critical for debugging agent loops where the model calls the wrong tool, passes malformed arguments, or enters an infinite retry cycle. You can see exactly what the model "thought" it was doing, compare that with what actually happened, and identify where the reasoning went wrong.

Error and Refusal Detection

Traces automatically flag API errors, rate limit hits, content policy refusals, and unexpected finish reasons. Glassbrain takes this further with AI-powered fix suggestions that analyze failed traces and recommend specific changes to resolve the issue. Instead of staring at a cryptic error code and searching documentation, you get a concrete suggestion tailored to the actual trace data.

Replay Without API Keys

One of the most powerful capabilities enabled by tracing is replay. With Glassbrain, you can re-run any traced request directly from the dashboard without needing your own API keys. This means a developer debugging a production issue can reproduce the exact scenario that caused a problem without setting up local credentials or burning their own API budget. It also makes it safe to share traces across team members who may not have direct API access.

Three Approaches Compared

The following table provides a side-by-side comparison of the three most common approaches to adding LLM tracing to an existing application. The differences in setup time, code impact, and ongoing maintenance are significant.

Criteria	SDK Wrapping	Proxy Server	OpenTelemetry
Setup Time	Under 5 minutes	1 to 4 hours	1 to 3 days
Code Changes	1 line (wrap call)	Endpoint URL swap plus config	Instrumentation across codebase
Latency Impact	Negligible (async export)	Added network hop per call	Negligible (async export)
Streaming Support	Transparent, no extra config	Complex buffering required	Manual span management
Trace Depth	Full prompt, response, tokens, tool calls	Request/response logging	Customizable but manual
Infrastructure	No self-hosting required	Requires proxy server	Requires collector and backend
Incremental Rollout	Wrap one client at a time	All-or-nothing URL swap	Per-call instrumentation
Best For	Existing apps needing fast observability	Multi-language polyglot environments	Teams with existing OTel infrastructure

For teams that already have OpenTelemetry infrastructure in place, the OTel approach may make sense as a long-term investment. For teams running polyglot environments where not all services have SDK support, a proxy can provide uniform coverage. But for the majority of teams that want to add span-level tracing to an existing LLM pipeline without a rewrite, SDK wrapping is the fastest path with the lowest risk.

Adding Tracing to a Real App in 5 Minutes

Here is the step-by-step process for adding Glassbrain tracing to an existing application. The entire process takes less than five minutes and requires changing exactly one line of application code.

Step 1: Install the package. For JavaScript, install glassbrain-js via npm. For Python, install glassbrain via pip. Both packages are lightweight and have minimal dependencies, so they will not bloat your application bundle or introduce version conflicts.

Step 2: Import the wrap function. In the file where you initialize your OpenAI or Anthropic client, import wrapOpenAI (JavaScript) or wrap_openai (Python). For Anthropic users, the corresponding function is wrap_anthropic. The import is a single line.

Step 3: Wrap your existing client. Find the line where you create your client instance and pass it through the wrap function. The wrapped client is a drop-in replacement. It has the same type signature, the same methods, and the same behavior. Your existing code that uses the client does not need to change at all.

Step 4: Run your application. That is it for code changes. Every API call through the wrapped client now generates trace data. There are no additional configuration files, no collector processes to start, and no infrastructure to provision. The traces are exported asynchronously in the background.

Step 5: View your traces. Open the Glassbrain dashboard. You will see a visual trace tree for each request, showing the full prompt, the model response, token counts, latency, and any tool calls. The replay feature lets you re-run any traced request without needing your own API keys. The AI fix suggestions feature analyzes failed or slow traces and recommends specific improvements.

The free tier includes 1,000 traces per month with no credit card required. That is enough to trace a meaningful sample of production traffic and identify patterns before deciding whether to scale up.

Common Concerns When Adding Tracing

Teams evaluating tracing solutions typically raise a consistent set of concerns. Here are the most common ones, along with honest answers about how SDK wrapping handles each.

Will it slow down my app?

The wrapper captures trace data synchronously (it needs to see the request and response) but exports it asynchronously in the background. The actual overhead on each API call is negligible compared to the latency of the LLM API call itself, which typically ranges from hundreds of milliseconds to several seconds. In practice, the latency impact is unmeasurable in production environments. The LLM call dominates the timing so completely that the wrapper's overhead is lost in the noise.

Does it capture streaming responses?

Yes. The wrapper handles streaming transparently. It intercepts the stream, captures each chunk as it arrives, and reassembles the full response for the trace record while still delivering chunks to your application in real time. Your streaming handlers, progress indicators, and token-by-token rendering all continue to work exactly as before. The trace simply records the complete assembled response along with timing data for the stream duration.

What about sensitive data?

Trace data includes prompts and responses, which may contain user information. This is a legitimate concern for any tracing solution. When evaluating tools, check the data retention policies, access controls, and whether the platform supports redaction or filtering of sensitive fields. Consider whether your compliance requirements (GDPR, HIPAA, SOC 2) impose constraints on where trace data can be stored and who can access it. Some teams start by tracing only internal or non-sensitive endpoints and expand coverage after validating the data handling meets their requirements.

Can I add it to just one endpoint first?

Absolutely. Because SDK wrapping operates at the client level, you can create a wrapped client for the endpoints you want to trace and use the unwrapped client everywhere else. This lets you roll out tracing incrementally, starting with the endpoints that matter most (or the ones causing the most production issues) and expanding from there. There is no requirement to instrument the entire application at once, and no risk that wrapping one client will affect the behavior of another.

Do I need to self-host anything?

No. Glassbrain is a hosted service. There are no collectors to deploy, no databases to manage, and no infrastructure to maintain. The SDK handles all communication with the tracing backend. You install the package, wrap your client, and traces start appearing in the dashboard. This is a deliberate design choice: the goal is to remove barriers to adoption, not add new infrastructure requirements.

Real-World Scenarios Where This Approach Shines

SDK wrapping is especially valuable in scenarios where other approaches would require significant refactoring or introduce unacceptable risk.

Legacy LLM pipelines. If your application was built before the current wave of LLM frameworks, it probably uses direct API calls. Retrofitting a framework just for tracing makes no sense. SDK wrapping works with any code that uses the standard OpenAI or Anthropic client libraries.

Multi-model applications. Applications that use different models for different tasks (one model for classification, another for generation, another for summarization) can wrap each client independently. All traces flow into the same dashboard, giving you a unified view of the entire pipeline regardless of which provider handles each step.

Debugging production issues under time pressure. When something breaks in production and the team needs answers immediately, the last thing anyone wants is a multi-day integration project. SDK wrapping can be added, deployed, and producing useful traces within minutes. This makes it viable as a debugging tool even if you do not plan to keep it running permanently (though most teams do, once they see the data).

Validating before committing. Because the free tier includes 1,000 traces per month with no credit card, teams can evaluate the approach on real production data before making any purchasing decisions or long-term commitments. The five-minute setup means the evaluation cost is measured in minutes, not sprints.

Frequently Asked Questions

Do I need to modify my prompt templates to use tracing?

No. SDK wrapping captures prompts exactly as they are sent to the API. Your prompt templates, string formatting, variable injection, and message array construction all remain untouched. The wrapper observes the final payload that the client sends to the API, so it captures the fully rendered prompt regardless of how you constructed it.

Does SDK wrapping work with LangChain or other frameworks?

If the framework ultimately uses an OpenAI or Anthropic client under the hood (and most do), you can wrap that underlying client. The traces will capture the API-level interactions even though the calls were initiated by the framework. For applications that call the API directly without a framework, SDK wrapping works with no friction at all. The wrapper does not care how the call was initiated, only that it passes through the client it is wrapping.

How much does LLM tracing cost?

Glassbrain offers a free tier of 1,000 traces per month with no credit card required. The free tier includes all features: visual trace trees, replay without needing your own API keys, AI fix suggestions, and full prompt and response capture. There are no feature gates on the free plan. Paid plans are available for teams that need higher trace volumes.

Can I trace both OpenAI and Anthropic calls in the same application?

Yes. You can wrap multiple clients independently. Use wrapOpenAI (or wrap_openai) for your OpenAI client and wrap_anthropic for your Anthropic client. Both will send traces to the same dashboard, giving you a unified view across providers. This is particularly useful for applications that route different types of requests to different models based on complexity, cost, or capability requirements.

What happens if the tracing service is temporarily unavailable?

Well-designed SDK wrappers handle export failures gracefully. Your application continues to function normally because the LLM API calls are not dependent on the tracing export succeeding. The wrapper treats trace export as a best-effort operation. If the export fails, the LLM call still completes, your application still gets its response, and the only consequence is a gap in your trace data until the service recovers.