LlamaIndex Integration

Trace your LlamaIndex queries end to end - from the initial query through retrieval, node processing, and response synthesis. Glassbrain integrates via the LlamaIndex callback system to capture the full execution flow of your RAG pipelines.

Pro plan and above. The LlamaIndex integration is available on the Pro plan and above. Upgrade your plan in the dashboard to enable this integration.

Installation

Install the Glassbrain SDK alongside LlamaIndex for your language.

JavaScript / TypeScript

bashTerminal
npm install @glassbrain/js llamaindex

Python

bashTerminal
pip install glassbrain llama-index

Quick Start

Register the Glassbrain callback handler with LlamaIndex to start tracing. Once registered, all query engine and index operations are traced automatically.

JavaScript / TypeScript

typescriptindex.ts
400 font-semibold">import {
  VectorStoreIndex,
  SimpleDirectoryReader,
  Settings,
} 400 font-semibold">from 400 font-semibold">class="text-emerald-400">"llamaindex";
400 font-semibold">import { GlassbrainCallbackHandler } 400 font-semibold">from 400 font-semibold">class="text-emerald-400">"@glassbrain/js/llamaindex";

400 font-semibold">class="text-[rgba(255,255,255,0.3)] italic">// Create and register the Glassbrain callback handler
400 font-semibold">const glassbrainHandler = 400 font-semibold">new GlassbrainCallbackHandler({
  projectKey: process.env.GLASSBRAIN_PROJECT_KEY,
});

Settings.callbackManager.addHandler(glassbrainHandler);

400 font-semibold">class="text-[rgba(255,255,255,0.3)] italic">// Load documents and build an index 400 font-semibold">as usual
400 font-semibold">const documents = 400 font-semibold">await 400 font-semibold">new SimpleDirectoryReader().loadData(400 font-semibold">class="text-emerald-400">"./data");
400 font-semibold">const index = 400 font-semibold">await VectorStoreIndex.fromDocuments(documents);

400 font-semibold">class="text-[rgba(255,255,255,0.3)] italic">// Query the index - the full pipeline is traced automatically
400 font-semibold">const queryEngine = index.asQueryEngine();
400 font-semibold">const response = 400 font-semibold">await queryEngine.query(400 font-semibold">class="text-emerald-400">"What is the main topic?");

console.log(response.toString());

Python

pythonmain.py
400 font-semibold">import os
400 font-semibold">from llama_index.core 400 font-semibold">import VectorStoreIndex, SimpleDirectoryReader, Settings
400 font-semibold">from llama_index.core.callbacks 400 font-semibold">import CallbackManager
400 font-semibold">from glassbrain.llamaindex 400 font-semibold">import GlassbrainCallbackHandler

400 font-semibold">class="text-[rgba(255,255,255,0.3)] italic"># Create the Glassbrain callback handler
glassbrain_handler = GlassbrainCallbackHandler(
    project_key=os.environ[400 font-semibold">class="text-emerald-400">"GLASSBRAIN_PROJECT_KEY"]
)

400 font-semibold">class="text-[rgba(255,255,255,0.3)] italic"># Register it 400 font-semibold">with LlamaIndex
Settings.callback_manager = CallbackManager([glassbrain_handler])

400 font-semibold">class="text-[rgba(255,255,255,0.3)] italic"># Load documents 400 font-semibold">and build an index 400 font-semibold">as usual
documents = SimpleDirectoryReader(400 font-semibold">class="text-emerald-400">"./data").load_data()
index = VectorStoreIndex.from_documents(documents)

400 font-semibold">class="text-[rgba(255,255,255,0.3)] italic"># Query the index - the full pipeline 400 font-semibold">is traced automatically
query_engine = index.as_query_engine()
response = query_engine.query(400 font-semibold">class="text-emerald-400">"What 400 font-semibold">is the main topic?")

print(response)

How It Works

The GlassbrainCallbackHandler implements the LlamaIndex callback interface. When you run a query, LlamaIndex fires callback events at each stage of execution: query start, retrieval, node postprocessing, synthesis, LLM calls, and query end. Glassbrain captures these events and organizes them into a hierarchical trace.

The callback handler traces the following LlamaIndex event types:

  • QUERY - Top-level query engine execution
  • RETRIEVE - Document and node retrieval
  • SYNTHESIZE - Response synthesis from retrieved nodes
  • LLM - Individual LLM calls
  • EMBEDDING - Embedding generation for queries and documents
  • NODE_PARSING - Document to node parsing

What Gets Traced

Each stage of the LlamaIndex pipeline produces a span with data specific to that stage. Here are the key span types.

jsonQuery engine span
{
  "span_id": "sp_query_001",
  "trace_id": "tr_li_456",
  "type": "query",
  "name": "VectorIndexQuery",
  "timestamp": "2026-04-03T12:00:00.000Z",
  "duration_ms": 4200,
  "status": "success",
  "input": {
    "query_str": "What is the main topic?"
  },
  "output": {
    "response": "The main topic discussed in the documents is...",
    "source_nodes": ["node_001", "node_002", "node_003"]
  },
  "children": ["sp_retrieve_001", "sp_synthesize_001"]
}
jsonRetrieval span
{
  "span_id": "sp_retrieve_001",
  "parent_span_id": "sp_query_001",
  "type": "retrieve",
  "name": "VectorIndexRetriever",
  "duration_ms": 320,
  "input": {
    "query_str": "What is the main topic?",
    "similarity_top_k": 3
  },
  "output": {
    "nodes": [
      {
        "node_id": "node_001",
        "text": "The document covers...",
        "score": 0.92,
        "metadata": { "file_name": "report.pdf", "page": 1 }
      },
      {
        "node_id": "node_002",
        "text": "Additional context...",
        "score": 0.87,
        "metadata": { "file_name": "report.pdf", "page": 3 }
      }
    ]
  }
}
jsonSynthesis span
{
  "span_id": "sp_synthesize_001",
  "parent_span_id": "sp_query_001",
  "type": "synthesize",
  "name": "CompactAndRefine",
  "duration_ms": 3100,
  "input": {
    "query_str": "What is the main topic?",
    "nodes_count": 3
  },
  "output": {
    "response": "The main topic discussed in the documents is..."
  },
  "children": ["sp_llm_001"]
}
jsonNode processing span
{
  "span_id": "sp_node_001",
  "parent_span_id": "sp_query_001",
  "type": "node_parsing",
  "name": "SentenceSplitter",
  "duration_ms": 45,
  "input": {
    "documents_count": 1
  },
  "output": {
    "nodes_count": 12,
    "avg_node_length": 512
  }
}

RAG Pipeline Tracing

For complex RAG pipelines with custom retrievers, rerankers, and response synthesizers, Glassbrain captures every component in the pipeline. Here is an example with a more advanced setup.

pythonrag_pipeline.py
400 font-semibold">import os
400 font-semibold">from llama_index.core 400 font-semibold">import VectorStoreIndex, SimpleDirectoryReader, Settings
400 font-semibold">from llama_index.core.callbacks 400 font-semibold">import CallbackManager
400 font-semibold">from llama_index.core.postprocessor 400 font-semibold">import SimilarityPostprocessor
400 font-semibold">from llama_index.core.response_synthesizers 400 font-semibold">import get_response_synthesizer
400 font-semibold">from glassbrain.llamaindex 400 font-semibold">import GlassbrainCallbackHandler

400 font-semibold">class="text-[rgba(255,255,255,0.3)] italic"># Set up Glassbrain tracing
glassbrain_handler = GlassbrainCallbackHandler(
    project_key=os.environ[400 font-semibold">class="text-emerald-400">"GLASSBRAIN_PROJECT_KEY"]
)
Settings.callback_manager = CallbackManager([glassbrain_handler])

400 font-semibold">class="text-[rgba(255,255,255,0.3)] italic"># Build the index
documents = SimpleDirectoryReader(400 font-semibold">class="text-emerald-400">"./data").load_data()
index = VectorStoreIndex.from_documents(documents)

400 font-semibold">class="text-[rgba(255,255,255,0.3)] italic"># Create a query engine 400 font-semibold">with custom components
query_engine = index.as_query_engine(
    similarity_top_k=5,
    node_postprocessors=[
        SimilarityPostprocessor(similarity_cutoff=0.7)
    ],
    response_synthesizer=get_response_synthesizer(
        response_mode=400 font-semibold">class="text-emerald-400">"compact"
    ),
)

400 font-semibold">class="text-[rgba(255,255,255,0.3)] italic"># Every component 400 font-semibold">in this pipeline 400 font-semibold">is traced:
400 font-semibold">class="text-[rgba(255,255,255,0.3)] italic"># Query -> Retrieve (5 nodes) -> Postprocess (filter by score)
400 font-semibold">class="text-[rgba(255,255,255,0.3)] italic">#       -> Synthesize -> LLM call -> Response
response = query_engine.query(
    400 font-semibold">class="text-emerald-400">"Summarize the key findings 400 font-semibold">from the research papers"
)

print(response)

Advanced Configuration

Customize the callback handler with additional options.

pythonconfig.py
glassbrain_handler = GlassbrainCallbackHandler(
    project_key=os.environ[400 font-semibold">class="text-emerald-400">"GLASSBRAIN_PROJECT_KEY"],

    400 font-semibold">class="text-[rgba(255,255,255,0.3)] italic"># Add custom metadata to every trace
    metadata={
        400 font-semibold">class="text-emerald-400">"environment": 400 font-semibold">class="text-emerald-400">"production",
        400 font-semibold">class="text-emerald-400">"pipeline": 400 font-semibold">class="text-emerald-400">"document-qa",
        400 font-semibold">class="text-emerald-400">"index_version": 400 font-semibold">class="text-emerald-400">"v3",
    },

    400 font-semibold">class="text-[rgba(255,255,255,0.3)] italic"># Control what gets captured
    capture_input=400">True,       400 font-semibold">class="text-[rgba(255,255,255,0.3)] italic"># Set to 400">False to skip logging queries
    capture_output=400">True,      400 font-semibold">class="text-[rgba(255,255,255,0.3)] italic"># Set to 400">False to skip logging responses
    capture_node_content=400">True, 400 font-semibold">class="text-[rgba(255,255,255,0.3)] italic"># Set to 400">False to skip node text content

    400 font-semibold">class="text-[rgba(255,255,255,0.3)] italic"># Sampling rate (0.0 to 1.0)
    sample_rate=1.0,

    400 font-semibold">class="text-[rgba(255,255,255,0.3)] italic"># Maximum node content length to capture (chars)
    max_node_content_length=10000,
)

Troubleshooting

No traces appearing after running queries

Verify that the callback handler is registered with Settings.callback_manager before creating your index or query engine. If you create the index before registering the handler, it will not be attached to the query engine. Also confirm that your project key is valid.

Feature not available error

The LlamaIndex integration requires the Pro plan or above. Check your current plan in the Glassbrain dashboard under Account Settings. If you recently upgraded, allow a few minutes for the change to propagate.

Retrieved nodes show empty content

Check that capture_node_content is set to True (the default). If node content is still missing, verify that your nodes have the text attribute populated. Some custom node types may store content in a different field.

Embedding spans are not captured

Embedding events are only fired during index construction and when the query engine generates a query embedding. If you built the index before registering the handler, index-time embeddings will not be traced. Re-register the handler and rebuild the index, or focus on query-time traces where the query embedding will be captured.