CS360 Engineering Blog

Insights, updates, and technical deep dives from the CS360 engineering team.

LangGraph Agents Are Token-Bound Systems

AI agents are increasingly built as stateful systems that reason across multiple steps rather than producing a single response. Frameworks like LangGraph formalize this behavior by modeling agents as graphs of nodes operating over shared state.

However, despite this architectural sophistication, every LangGraph agent ultimately executes inside a token constrained language model. No matter how complex the graph is, the agent’s understanding of the task is limited to what exists inside the token context passed to the model at each step.

To design reliable LangGraph agents, engineers must understand how agent context is constructed, mutated and constrained through tokens.


Agent Context in LangGraph

In LangGraph, agent state is explicitly defined and passed between nodes. This makes context management visible but not unlimited.

A simplified agent state might look like this:

from typing import TypedDict, List

class AgentState(TypedDict):
    messages: List[str]

Each node in the graph reads from and writes to this state. Eventually, the messages list is converted into a prompt and sent to the LLM.

Conceptually:

prompt = "\n".join(state["messages"])
response = llm.invoke(prompt)

Although LangGraph manages the flow, the language model only sees tokens, not structured state.


Tokens Are the True Execution State

Before the LLM processes the prompt, it is tokenized:

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("gpt2")
tokens = tokenizer.encode(prompt)

print(f"Token count: {len(tokens)}")

This token sequence defines the entire cognitive universe of the agent at that step.

Even in LangGraph:

Everything collapses into tokens at execution time.


Context Growth Across LangGraph Nodes

Consider a LangGraph agent with multiple nodes:

Input → Router → Tool Node → Memory Node → Summary Node

Each node appends information to the shared state:

def tool_node(state: AgentState):
    tool_result = fetch_client_data()
    state["messages"].append(f"Tool output: {tool_result}")
    return state

This looks harmless but token growth is cumulative. Tool outputs, especially JSON responses, are one of the fastest ways to exhaust context windows.

Without intervention, agents eventually fail not because of logic errors but because important tokens are pushed out of context.


Token Limits as a Graph-Level Constraint

LangGraph does not automatically enforce token budgets. This responsibility falls on the engineer.

A typical failure pattern looks like this:

  1. Early system instructions define agent behavior
  2. Multiple tool nodes add verbose outputs
  3. Memory retrieval injects large text blocks
  4. The original instructions are truncated

From the agent’s perspective, the instructions simply no longer exist. Whether instructions are hard truncated or simply overwhelmed by later tokens, the effect is the same: loss of behavioral control.

This is why token constraints must be treated as graph level design constraints, not node-level details.


Token Aware State Management in LangGraph

A common best practice is to separate raw data from summarized context.

Instead of appending raw tool output:

state["messages"].append(str(tool_result))

Introduce a summarization step:

def summarize_tool_output(state: AgentState):
    raw_output = state["messages"][-1]
    summary = llm.invoke(f"Summarize for decision-making:\n{raw_output}")
    state["messages"][-1] = f"Tool summary: {summary}"
    return state

This keeps the context semantically dense but token efficient.


Memory in LangGraph: Retrieval Is Not Enough

LangGraph agents often integrate external memory (Redis, vector stores). However, memory retrieval alone does not guarantee better performance.

def memory_node(state: AgentState):
    memory = retrieve_similar_events(state["messages"][-1])
    state["messages"].append(f"Past context: {memory}")
    return state

This pattern frequently leads to context overload.

A better approach is selective rehydration:

def memory_node(state: AgentState):
    memory = retrieve_similar_events(state["messages"][-1])
    compressed = llm.invoke(f"Extract only constraints and preferences:\n{memory}")
    state["messages"].append(f"Relevant memory: {compressed}")
    return state

Memory becomes actionable context, not historical baggage.


Reasoning Tokens and Planning Nodes

LangGraph enables explicit planning nodes, but planning also consumes tokens.

def planner_node(state: AgentState):
    plan = llm.invoke(
        "Create a step-by-step plan based on the context:\n"
        + "\n".join(state["messages"])
    )
    state["messages"].append(f"Plan: {plan}")
    return state

This improves reliability but increases token pressure.

Well-designed LangGraph agents:

Reasoning must evolve, not accumulate.


Token Budgets as a Design Pattern

In production LangGraph systems, teams often introduce soft token budgets per stage:

Although LangGraph does not enforce this natively, engineers can add checks:

def enforce_token_limit(state: AgentState, max_tokens=6000):
    tokens = tokenizer.encode("\n".join(state["messages"]))
    if len(tokens) > max_tokens:
        state["messages"] = summarize_context(state["messages"])
    return state

This transforms token limits from a failure mode into a controlled mechanism.


A Mental Model for LangGraph Engineers

LangGraph Concept Token Reality
State Token sequence
Node output Token append
Memory Token rehydration
Planning Token consumption
Graph depth Context pressure

Once this mapping is understood, agent behavior becomes far more predictable.


Why This Matters in Production

Many LangGraph agent failures are misdiagnosed as:

In reality, they are context failures caused by unmanaged tokens.

Agents that manage tokens well:


Conclusion

LangGraph gives engineers powerful abstractions for building agentic systems, but it does not remove the fundamental constraint of token limited context.

Understanding agent context through tokens allows engineers to design graphs that are not just functional, but reliable and scalable. Tokens are not an implementation detail—they are the execution medium through which intelligence is reconstructed at every step.

The best LangGraph agents are not those with the most nodes, but those with the clearest, most disciplined context.