Why is token-by-token streaming not working after upgrading LangGraph?

Last updated: October 29, 2025

Context

After upgrading LangGraph from version 0.3.x to 0.4.0 or higher, you may notice that token-by-token streaming is no longer working as expected. Instead of receiving partial messages as the LLM generates tokens, you only receive complete messages at the end of the response. This issue commonly occurs when using ReAct agents with stream_mode = "messages".

Answer

This behavior change was introduced in LangGraph 0.4.0+ due to modifications in how streaming works with subgraphs. Here are the solutions to restore token-by-token streaming:

Solution 1: Enable Subgraph Streaming

The most straightforward solution is to enable subgraph streaming in your configuration:

Add stream_subgraphs to your stream configuration
Update your stream mode to include subgraph streaming:

"stream_mode": ["messages", "values"],
"stream_subgraphs": true

Note that with subgraph streaming enabled, message types will have a postfix format like <message_type>|react:xxx.

Solution 2: Update Frontend Packages (for UI Integration)

If you're using a frontend UI that connects to your agent, ensure your LangChain packages are compatible:

Update your frontend LangChain packages to version 1.0+:

npm install @langchain/core@latest @langchain/langgraph-sdk@latest

Restart your development server
Perform a hard refresh in your browser (Cmd+Shift+R or Ctrl+Shift+R)

Solution 3: Filter Intermediate Outputs (for create_agent)

If you're using create_agent from langchain.agents and experiencing duplicate content, filter out intermediate decision points:

async for chunk in agent.stream:
    # Only stream final outputs, not intermediate decisions
    if chunk.get('metadata', {}).get('step_type') == 'final':
        yield chunk

Additional Resources

For more detailed guidance on streaming configurations, refer to the LangGraph streaming documentation.