Zestminds

FastAPI + LangGraph Integration Guide: Build AI Agent Workflows

Want to build a practical AI agent workflow with FastAPI and LangGraph?

This guide walks through how to connect LangGraph’s stateful workflow orchestration with FastAPI’s API layer so you can expose AI agents, route requests, support human review, and prepare the system for production deployment.

It is written for CTOs, AI engineers, backend developers, startup founders, and technical teams that want to move beyond a simple LLM API call and build more reliable AI workflows with clear routing, state management, and deployment structure.

Shivam Sharma
By Shivam Sharma Updated April 29, 2026

What You'll Learn

  • What LangGraph does and when it is useful for AI agent workflows
  • Why FastAPI works well as the API layer for LangGraph workflows
  • How to structure a FastAPI + LangGraph project
  • How to expose a LangGraph workflow through a FastAPI endpoint
  • How to run and test the workflow locally
  • How to think about deployment, streaming, human-in-the-loop review, and production readiness
  • Where this architecture fits in real-world AI workflow automation projects

Why LangGraph Works Well for Stateful AI Workflows

LangGraph is useful when an AI workflow needs structure, state, branching, and control instead of one simple request-response call.

LangGraph is a graph-based orchestration framework for building AI workflows. Instead of treating an AI system as one long chain, it lets you define the workflow as a set of states, nodes, and edges.

This makes it useful when your workflow needs to:

  • Maintain state: Keep track of user input, intermediate outputs, decisions, and final responses.
  • Control flow: Route requests based on intent, confidence, risk, or business rules.
  • Use multiple steps: Combine LLM calls, retrieval, tools, validation, and human review.
  • Handle exceptions: Add fallback paths, retries, and escalation logic where needed.
  • Support human-in-the-loop: Pause or route sensitive cases for manual review before continuing.

For simple AI features, a direct FastAPI endpoint calling an LLM may be enough. LangGraph becomes more useful when the workflow has branching logic, memory, multiple tools, or review steps.

For deeper technical details, you can review the official LangGraph Graph API documentation.

Why Pair LangGraph with FastAPI?

LangGraph handles the workflow layer. FastAPI handles the API layer.

This separation is important because production AI systems usually need more than a graph. They need authentication, request validation, API routes, response models, background tasks, logging, deployment, and integration with frontend or backend systems.

FastAPI is a strong fit when you need to expose LangGraph workflows through clean API endpoints. If you need help building production-ready APIs around AI workflows, our FastAPI development services team can support the backend architecture, implementation, testing, and deployment.

  • FastAPI: Receives requests, validates inputs, exposes endpoints, and returns responses.
  • LangGraph: Manages the AI workflow, state, routing, tools, and decision paths.
  • LLM provider: Generates responses, classifications, summaries, or decisions.
  • Vector database or knowledge base: Adds retrieval when the agent needs business-specific context.
  • Queue or worker layer: Handles longer-running workflows when synchronous execution is not enough.

If you are using response models in FastAPI, the FastAPI response model documentation is a useful reference.

FastAPI + LangGraph Architecture Overview

A practical FastAPI + LangGraph setup usually follows this flow:

  1. User or frontend sends a request to a FastAPI endpoint.
  2. FastAPI validates the request using a schema.
  3. The endpoint passes the input into a compiled LangGraph workflow.
  4. LangGraph routes the request through nodes such as classify, retrieve, generate, review, or escalate.
  5. The workflow returns a final state.
  6. FastAPI returns the final answer or status to the caller.

For a demo, this can run in one FastAPI app. For production, you may separate the API, graph logic, tools, memory, workers, and observability layers.

What You'll Build

We'll use a simple customer support AI agent as the example.

The workflow will:

  1. Accept a support query through a FastAPI endpoint.
  2. Pass the query into a LangGraph workflow.
  3. Classify the request.
  4. Draft a response using agent logic.
  5. Route sensitive or complex cases to human review.
  6. Return the final response through the API.

This example is intentionally simple, but the same structure can support document workflows, lead qualification, internal knowledge assistants, support triage, and AI workflow automation systems.

Step 1: Install Your Stack

Start with the core packages needed for a basic FastAPI + LangGraph project.

pip install fastapi uvicorn langgraph pydantic

If your workflow uses an LLM provider, vector database, or observability tool, add those packages separately. For production, keep API keys and secrets in environment variables instead of hardcoding them in the application.

Production-Ready FastAPI + LangGraph Project Structure

For a small proof of concept, one or two files may be enough. For a real product, separate the API layer, graph logic, tools, configuration, and tests.

app/
  main.py
  api/
    support.py
  graph/
    workflow.py
    state.py
  agents/
    support_agent.py
  tools/
    knowledge_base.py
  memory/
    store.py
  config/
    settings.py
  tests/
    test_support_workflow.py
Dockerfile
.env.example
requirements.txt
pyproject.toml
README.md

This structure makes it easier to test graph nodes, swap model providers, add new tools, and deploy the API without turning the project into a hard-to-maintain script.

Step 2: Define Your FastAPI API Layer

FastAPI should handle request validation and response formatting. Keep the endpoint clean and let the LangGraph workflow handle the orchestration logic.

The endpoint below assumes the compiled support_graph is imported from your graph module. The graph itself is shown in the next step.

from fastapi import FastAPI
from pydantic import BaseModel
from app.graph.workflow import support_graph

app = FastAPI()

class SupportRequest(BaseModel):
    query: str

class SupportResponse(BaseModel):
    reply: str
    route: str

@app.post("/support", response_model=SupportResponse)
async def support_handler(payload: SupportRequest):
    result = support_graph.invoke({
        "query": payload.query,
        "category": "",
        "draft": "",
        "final_answer": "",
        "route": ""
    })

    return SupportResponse(
        reply=result["final_answer"],
        route=result["route"]
    )

For short workflows, direct invocation can work. For long-running tasks, connect the endpoint to a queue or worker so the API does not block while the workflow runs.

Step 3: Define the LangGraph Workflow

LangGraph lets you define a shared state and route it through different nodes. In this example, the workflow classifies a query, drafts a response, and decides whether the request needs human review.

from typing import Literal, TypedDict
from langgraph.graph import StateGraph, START, END

class SupportState(TypedDict):
    query: str
    category: str
    draft: str
    final_answer: str
    route: str

def classify_request(state: SupportState) -> dict:
    query = state["query"].lower()

    if "refund" in query or "billing" in query:
        return {"category": "billing"}

    return {"category": "general"}

def draft_response(state: SupportState) -> dict:
    if state["category"] == "billing":
        return {
            "draft": "This looks like a billing request. A support specialist should review it before a final reply is sent."
        }

    return {
        "draft": "Thanks for your question. Here is a helpful first response based on the available support workflow."
    }

def route_request(state: SupportState) -> Literal["human_review", "final"]:
    if state["category"] == "billing":
        return "human_review"

    return "final"

def human_review(state: SupportState) -> dict:
    return {
        "final_answer": state["draft"],
        "route": "human_review"
    }

def final_response(state: SupportState) -> dict:
    return {
        "final_answer": state["draft"],
        "route": "automated"
    }

builder = StateGraph(SupportState)

builder.add_node("classify_request", classify_request)
builder.add_node("draft_response", draft_response)
builder.add_node("human_review", human_review)
builder.add_node("final_response", final_response)

builder.add_edge(START, "classify_request")
builder.add_edge("classify_request", "draft_response")

builder.add_conditional_edges(
    "draft_response",
    route_request,
    {
        "human_review": "human_review",
        "final": "final_response"
    }
)

builder.add_edge("human_review", END)
builder.add_edge("final_response", END)

support_graph = builder.compile()

This is a minimal workflow. In a real system, the nodes may call an LLM, search a vector database, check CRM records, validate policy rules, or create a ticket for human review.

Step 4: Run and Test Locally

Run the FastAPI app locally with Uvicorn.

uvicorn app.main:app --reload

Then test the endpoint with curl or Postman.

curl -X POST http://localhost:8000/support \
  -H "Content-Type: application/json" \
  -d '{"query":"Can I get a refund for my order?"}'

The response should include the agent reply and the route selected by the workflow.

{
  "reply": "This looks like a billing request. A support specialist should review it before a final reply is sent.",
  "route": "human_review"
}

FastAPI + LangGraph Integration Example Flow

The important part of the integration is not just calling LangGraph from FastAPI. It is keeping responsibilities clear.

  • FastAPI should validate the request and return a predictable response.
  • LangGraph should manage the workflow state and routing.
  • LLM/tool calls should live inside graph nodes or service functions.
  • Production-only concerns like auth, rate limits, retries, logging, and queues should not be ignored.

A basic integration flow looks like this:

Request
  → FastAPI endpoint
  → Pydantic validation
  → LangGraph compiled workflow
  → classify_request node
  → draft_response node
  → conditional route
  → final_response or human_review
  → FastAPI response

This keeps the API predictable while allowing the AI workflow to become more advanced over time.

Deploy LangGraph as a FastAPI Service

Once the local workflow works, package the application for deployment. A simple Dockerfile is enough for a small service.

FROM python:3.11-slim

WORKDIR /app

COPY . /app

RUN pip install --no-cache-dir -r requirements.txt

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

For production, also plan for:

  • Environment variables: Store model keys, database URLs, and secrets outside the codebase.
  • Health checks: Add a simple endpoint so deployment platforms can verify the service is alive.
  • Workers or queues: Use a queue when workflows are long-running or depend on external tools.
  • Logging: Log workflow route, execution time, errors, and fallback paths.
  • Observability: Track graph execution and model behavior so failures are easier to debug.
  • Rate limits: Protect expensive AI endpoints from abuse or accidental spikes.

If your team needs help moving a prototype from local testing to AWS, containers, workers, and monitoring, Zestminds can support the AI deployment and cloud engineering side of the build.

Streaming LangGraph Responses Through FastAPI

Streaming is useful when the AI response may take time and the user interface should start receiving progress or output before the full workflow finishes.

FastAPI can stream responses from an endpoint. If your model provider or graph execution exposes chunks or events, you can pass those chunks through the API. The FastAPI StreamingResponse documentation is a useful reference for this pattern.

from fastapi.responses import StreamingResponse

@app.get("/support/stream")
async def support_stream(query: str):
    async def event_stream():
        yield "Workflow started\n"

        result = support_graph.invoke({
            "query": query,
            "category": "",
            "draft": "",
            "final_answer": "",
            "route": ""
        })

        yield result["final_answer"]

    return StreamingResponse(event_stream(), media_type="text/plain")

This simple example streams a status message and then the final output. For real chat-style streaming, connect LangGraph or model event streaming to an async generator and pass chunks through FastAPI as they are produced.

Human-in-the-Loop Workflow with LangGraph and FastAPI

Human-in-the-loop review is useful when an AI workflow should not make every decision automatically. Examples include billing issues, medical support, legal review, compliance checks, account actions, and high-value customer requests.

In a FastAPI + LangGraph setup, the common pattern is:

  • LangGraph detects that a request needs review.
  • The workflow routes the state to a review path.
  • FastAPI exposes endpoints for review, approval, rejection, or resume actions.
  • The system stores enough state to continue the workflow safely.

For simple workflows, a conditional route may be enough. For advanced workflows, use persistent state and a proper interrupt/resume pattern so the workflow can pause and continue later.

def route_request(state: SupportState) -> Literal["human_review", "final"]:
    if state["category"] == "billing":
        return "human_review"

    return "final"

The goal is not to add human review everywhere. The goal is to add it where risk, ambiguity, or business rules require it.

Production Checklist for AI Agent Workflows

Before moving a FastAPI + LangGraph workflow into production, check the basics carefully.

  • Input validation: Every endpoint should validate request data before it reaches the workflow.
  • Clear state schema: Keep graph state predictable and easy to test.
  • Node-level testing: Test classification, retrieval, generation, and routing separately.
  • Fallback paths: Define what happens when the model fails, tools timeout, or confidence is low.
  • Human review: Route sensitive cases to a review path instead of forcing automation.
  • Logging and tracing: Track graph route, latency, errors, and final outcomes.
  • Secrets management: Keep API keys and database credentials out of source code.
  • Queue strategy: Use workers for long-running workflows or high-volume jobs.
  • Deployment plan: Decide how the service will scale, restart, and recover from failures.
  • Cost controls: Monitor model usage, retries, token volume, and repeated calls.

Real-World Use Cases

FastAPI + LangGraph is useful when the AI workflow has multiple steps, decisions, or integrations.

  • AI-powered support desk: Classify tickets, answer simple questions, escalate sensitive requests, and log outcomes.
  • Lead qualification chatbot: Ask qualifying questions, score intent, and sync useful leads with a CRM.
  • Document workflow automation: Extract information, summarize documents, route exceptions, and request human review.
  • Internal knowledge assistant: Search company data, draft answers, and escalate low-confidence responses.
  • Generative marketing assistant: Draft content from structured inputs, apply approval rules, and send reviewed outputs forward.

For teams planning similar workflows, our AI workflow automation services focus on turning repetitive, multi-step business processes into reliable AI-assisted systems.

Real-World Example: AI-Powered Support Desk in Healthcare

A support workflow like this can be especially useful in regulated or sensitive environments where not every response should be fully automated.

We have worked on an AI-powered support desk pattern for a hospital system where the workflow helped triage patient support requests and route cases to either knowledge sources or human staff where needed.

Read the AI-powered support desk case study to see how this kind of workflow can support real operational needs without removing human oversight from sensitive cases.

When to Use LangGraph vs LangChain vs Simple FastAPI Logic

Not every AI feature needs LangGraph. Use the simplest architecture that can safely handle the workflow.

  • Use simple FastAPI logic when the feature is a basic request-response AI call with no branching or state.
  • Use LangChain-style chains when the workflow is mostly linear and needs reusable model, prompt, or retrieval components.
  • Use LangGraph when the workflow needs state, branching, retries, multiple nodes, human review, or tool coordination.

If the product will handle real users, business data, compliance-sensitive decisions, or high-volume workflows, it is worth designing the architecture before writing too much code.

Ready to Build a Production AI Workflow?

A FastAPI + LangGraph prototype can be built quickly. The harder part is making it reliable enough for real users.

Before scaling, review whether your workflow has clear state, safe routing, error handling, observability, deployment planning, and human review where needed.

If you are building an AI workflow with FastAPI, LangGraph, RAG, streaming, or human-in-the-loop routing, Zestminds can help you review your AI workflow architecture before you scale it.

Also Read

FAQs

How do you integrate LangGraph with FastAPI?

Use FastAPI as the API layer and LangGraph as the workflow layer. FastAPI receives the request, validates input, calls the compiled LangGraph workflow, and returns the response.

Can you build a FastAPI + LangGraph AI agent example?

Yes. A common example is a support agent where FastAPI accepts the user query, LangGraph routes it through workflow nodes, and the final answer is returned through an API endpoint.

What should a production-ready FastAPI + LangGraph project structure include?

It should separate API routes, graph logic, agents, tools, config, memory, tests, and deployment files. This makes the workflow easier to test, maintain, and deploy.

How do you deploy LangGraph as a FastAPI service?

Package the FastAPI app with Docker, expose the LangGraph workflow through API endpoints, store secrets in environment variables, and deploy it to your preferred cloud platform.

Can you stream LangGraph responses through FastAPI?

Yes. FastAPI can stream responses from an endpoint while LangGraph handles the workflow execution. This is useful for chat interfaces, progress updates, and longer-running AI tasks.

How do you add human-in-the-loop review with LangGraph and FastAPI?

Add a conditional routing step that sends sensitive or low-confidence cases to a review path. FastAPI can expose endpoints for review, approval, rejection, or resume actions.

When should you use LangGraph instead of simple FastAPI logic?

Use LangGraph when your AI workflow needs state, branching, retries, tool calls, memory, or human review. Simple FastAPI logic is enough for basic request-response AI calls.

Share:
Shivam Sharma
Shivam Sharma

About the Author

With over 13 years of experience in software development, I am the Founder, Director, and CTO of Zestminds, an IT agency specializing in custom software solutions, AI innovation, and digital transformation. I lead a team of skilled engineers, helping businesses streamline processes, optimize performance, and achieve growth through scalable web and mobile applications, AI integration, and automation.

Schedule a Call

Before You Scale Further, Review the Architecture.

Let’s evaluate where your system stands — and where it may break under growth.

Schedule an Architecture Review 30-minute technical discussion. No obligation.