Building a Research Chat App on LangChain Managed Deep Agents (With Human Approval Before Web Search)

May 21, 2026

Master the Blueprint of Modern AI Engineering Go Beyond Prompting and Learn How Real AI Systems Are Built, Scaled, and Deployed in Production. AI engineering is no longer about calling...

https://systemdrd.com/ebooks/ai-engineers-blueprint

Most “AI demos” are a text box wired to an LLM. That works until the model tries to search the web, read a URL, or spend money on tools without you noticing.

This project is different. It is a small but complete app: a React chat UI, a FastAPI backend, and an agent definition you keep in Git. The interesting part is not the chat bubbles—it is how the same UI talks to three different runtimes (cloud managed agent, local open-source agent, or your own LangGraph deployment) and how it pauses the agent until a human approves a web search.

If you have been following system design topics—timeouts, idempotency, backpressure, “who owns state?”—you will recognize the same questions here, just with agents instead of microservices

What you are looking at

Open the app and you get a Research Assistant. You type a question. The agent can plan, take notes in a virtual filesystem, search the web, read pages, and even call a fact-checker subagent for specific claims.

Github : https://github.com/sysdr/langchain-echosystem

The repo is called langchain-echosystem . Layout:

langchain-echosystem/

├── agent/ ← what the agent *is* (instructions, tools, skills)

├── backend/ ← API + runtime switch

├── frontend/ ← chat UI + approval modal

└── langgraph.json ← optional deploy to LangSmith

Three layers, one product:

LayerJob

agent/

Personality, tools, when to ask a human

backend/

Pick runtime, stream tokens, handle interrupts

frontend/

Show chat, block input until you Approve/Reject

The agent is just files in `agent/`

Managed Deep Agents let you define an agent from the repo instead of clicking around a dashboard. That matters for newsletters and teams: version control beats copy-paste.

Instructions (`agent/AGENTS.md`)

The agent is told to behave like a careful researcher:

Clarify vague questions
Search when facts need the outside world
Not invent citations
Use bullet points and links
Delegate doubtful claims to a fact-checker

There is also a /memories/preferences.txt convention—durable user prefs saved across chats. That is a simple pattern for “long-term memory” without a separate database in this demo.

Tools (`agent/tools.json`)

Two tools come from LangChain’s Fleet MCP server:

tavily_web_search
read_url_content

The important line is interrupt_config. Web search is set to require human approval; reading a URL does not:

“interrupt_config”: {

“https://tools.langchain.com::tavily_web_search::Fleet”: true,

“https://tools.langchain.com::read_url_content::Fleet”: false

}

From a system design angle: this is policy as data. You are not hard-coding “if tool == search then pause” in Python; you declare it once and provision pushes it to the cloud agent.

Skills and subagents

agent/skills/research/SKILL.md — multi-step research workflow (plan → search → notes → synthesis).
agent/subagents/fact-checker.md — narrow job: verify claims, label Supported / Contradicted / Insufficient evidence, cite URLs.

Subagents are the agent equivalent of calling a specialist service instead of bloating one prompt.

One backend, three ways to run the brain

The backend does not assume you always have LangSmith preview access. AGENT_RUNTIME in .env can be auto, managed, local, or deployment.

backend/app/config.py resolves auto like this:

If LANGGRAPH_DEPLOYMENT_URL is set → deployment
Else if MANAGED_AGENT_ID + API key → managed
Else → local

get_runtime() in backend/app/runtime/__init__.py returns one of three classes with the same interface: create thread, stream chat, resolve interrupt, resume stream.

That is classic strategy pattern thinking: one API contract, pluggable implementations. Your frontend never branches on “are we local today?”

Managed mode (production-shaped)

ManagedRuntime talks to LangSmith’s /v1/deepagents API via DeepAgentsClient. It creates a thread, starts a streamed run, maps LangChain events to SSE:

messages → token chunks for the UI
values → if there is an interrupt, emit an interrupt event

You provision the cloud agent once:

make provision

backend/scripts/provision_agent.py reads everything under agent/, builds a JSON payload (instructions, tools, subagents, skills), POSTs or PATCHes the Managed Deep Agents API, and writes MANAGED_AGENT_ID back into .env. Change AGENTS.md, run provision again—the cloud agent updates. Git is the source of truth.

REQUIRE_HITL_APPROVAL env toggles whether web search needs approval at provision time—useful for demos vs stricter prod.

Local mode (laptop-friendly)

LocalRuntime uses open-source deepagents with an in-memory checkpointer. It still loads AGENTS.md as the system prompt, but web search is a stub that tells you to use managed mode for real search.

Good for UI work and backend tests without cloud keys. Bad for “did it really find that paper?”—by design.

Deployment mode (your own graph)

backend/agent.py defines a LangGraph-compatible graph with create_deep_agent. langgraph.json points at it for langgraph up. Point LANGGRAPH_DEPLOYMENT_URL and LANGGRAPH_ASSISTANT_ID at that deployment and AGENT_RUNTIME=deployment.

Same agent instructions file; different hosting. Useful when you want your infra and observability, not only the managed API.

How a message travels through the system

Here is the happy path in managed mode:

You (browser)

→ POST /api/conversations (new thread_id)

→ POST /api/chat/stream (SSE: tokens + maybe interrupt)

→ [optional] POST resolve-interrupt

→ [optional] POST resume-stream (more SSE tokens)

SSE (Server-Sent Events) means the server pushes many small events on one HTTP response. The frontend’s api.ts parses event: and data: lines—no WebSocket server required. For token streaming, that is often enough and simpler to operate behind proxies.

backend/app/routes/chat.py wraps the runtime iterator in EventSourceResponse. Event types include token, interrupt, error, and done.

On the React side (App.tsx):

User sends message → append user bubble + empty assistant bubble.
streamChat feeds tokens into the assistant bubble (markdown via react-markdown).
If onInterrupt fires → show InterruptPrompt modal, disable composer.
Approve → resolveInterrupt then resumeStream continues the same assistant message.
Reject → run stops; status says tool rejected.

The modal (InterruptPrompt.tsx) is deliberately plain: tool name, description, Approve / Reject. No mystery about what the agent wanted to do.

System design takeaway: the interrupt is a synchronization point. The agent’s run is not “failed”; it is blocked until an external decision arrives—like waiting on a human task in a workflow engine, or a payment authorization hold.

Human-in-the-loop in one paragraph

Without HITL, an agent can issue searches you did not intend (wrong query, leaked context, cost). With HITL:

Agent decides it needs tavily_web_search.
Runtime surfaces an interrupt in the stream.
UI stops; user approves or rejects.
resolve_interrupt tells the API the decision.
resume-stream continues generation.

That is fail-safe by default for the risky tool only. Reading URLs stays automatic—policy choice, not universal slowdown.

For interviews: relate this to circuit breakers, approval workflows, and least privilege. The agent does not get unfettered egress; it gets egress after a human gate for the sensitive action.

Frontend: small surface, clear states

The UI is one main component plus the interrupt overlay. State that matters:

health — from /api/health: which runtime, is it ready?
threadId / agentId — conversation scope
interrupt — blocks send until resolved
loading / resolving — button and textarea disabled appropriately

The header shows Managed Deep Agents vs Local vs LangSmith Deployment so you are never confused about which brain is answering.

Sample prompts on the empty state nudge system-design-style questions (“tradeoffs in agent memory”, “LangGraph durable execution”, “RAG vs long context”)—aligned with what your readers care about.

Docker and Makefile: how you actually run it

cp .env.example .env

make install

make provision # managed mode

make backend # :8000

make frontend # :5173

Or make docker-up → frontend on 3000, backend on 8000, healthcheck on the API before the UI container starts. Compose wires CORS for local and container hostnames.

The Makefile is thin on purpose: install, provision, run, docker. No hidden magic.

What I would tell a system design reader

Separate “agent definition” from “runtime.” Files in agent/ vs Python runtimes—same product, different ops models.
Stream tokens; don’t buffer the whole answer. SSE keeps latency honest and UX responsive.
Treat tool calls as side effects. Search is an side effect; gate it with HITL config, not hope.
Subagents bound blast radius. Fact-checking is a focused delegate, not a bigger main prompt.
Provision script = deployment pipeline for agents. CI could run provision_agent.py on every merge to agent/.

This is not a billion-user architecture. It is a correct end-to-end slice: auth to API keys in env, threaded conversations, streaming, interrupts, multi-runtime fallback, and deploy hooks. That is exactly what you want before scaling traffic—get the state machine right first.

Try it yourself

Clone the repo, set keys per .env.example, run make provision if you have LangSmith managed access, then ask something that needs the web. When the approval modal appears, you are seeing the interrupt_config from tools.json alive—not a mock.

If you only have model API keys, stay on local runtime: the UI and streaming still work; search returns the stub message until you point at managed or deployment.

Closing thought

Agents are moving from “chat completion” to systems: tools, memory, subgraphs, pauses, resumes. This app is a readable map of that shift—files for behavior, a router for where the graph runs, SSE for the wire, and a modal for the one tool call you refused to automate.

For System Design Interview Roadmap readers, the interview question is no longer only “design Twitter.” It is increasingly “design a worker that can call external APIs—who approves, where is state, what happens on retry?” This codebase is one honest answer.

System Design Interview Roadmap

Discussion about this post

Ready for more?