Engineering

LangGraph: when the complexity actually pays off

LangGraph is the most powerful and most painful agent framework. A walk through when state machines and checkpoints earn their cost, and when you should just use the Claude Agent SDK and move on.

14 May 202610 min readKrypto Forge

LangGraph is the framework we reach for least and rely on most when we reach for it. The complexity is real. The payoff is also real. The trick is knowing which side of the line you're on before you start, not after a week of fighting node signatures.

This post is the heuristic we use, written down.

The shape of LangGraph, in 90 seconds

LangGraph models an agent as an explicit graph. Nodes are functions. Edges are conditional routes. State is a typed dictionary that flows through the graph and accumulates. Every step can be checkpointed to disk, replayed, paused, resumed, and inspected.

That last sentence is the whole pitch. You get a state machine instead of an opaque agent loop.

In exchange you write more code. Node functions. State types. Edge conditions. Checkpoint configuration. A typical Claude Agent SDK loop is maybe 30 lines. A LangGraph version of the same loop is closer to 150. That's not because LangGraph is verbose for verbosity's sake. It's because you're explicitly designing the state machine instead of letting one happen.

The case where it earns its cost

Here's a concrete one from our work. A textile order arrives via WhatsApp. The agent has to:

Parse the WhatsApp message (sometimes a voice note, sometimes a photo of a paper chit).
Identify the customer, the products, the quantities, the delivery date.
Check inventory and supplier capacity.
If supplier capacity is short, draft an outreach message and wait for a human to confirm before sending it to the supplier.
Once stocked, generate a GST invoice draft.
Wait for the office manager to approve the invoice.
On approval, push to Tally, generate a payment link via Razorpay, send to the customer on WhatsApp.

Two properties make LangGraph the right call here.

Pauses that might last days. Step 4 might sit for two days waiting for the supplier to confirm. Step 6 might sit overnight waiting for the office manager to be at her desk. The process has to survive a server restart, a deploy, the workstation going to sleep. LangGraph's checkpoints handle this natively. The Claude Agent SDK does not, and faking it gets ugly fast.

Branching that's actually distinct. The "stocked" path and the "needs supplier" path do genuinely different things. The "amount over ₹50,000" path needs an extra approver. These aren't conversational branches the model can handle inline. They're real workflow paths with different stakeholders. Modelling them as edges in a graph makes the system inspectable.

Recovery from partial failure. If Razorpay's API is down at step 7, the workflow shouldn't restart from step 1. It should resume from step 7. Checkpoints make that one line of code. Without them you're writing your own state machine, badly.

When all three properties are present, LangGraph pays for itself in the first production incident.

The case where it doesn't

Same studio, different engagement. A client wants an agent that reads customer support emails and drafts a reply. The agent reads the email, looks up the customer's recent orders, drafts a response, the human edits and sends.

This does not need LangGraph. There are no long pauses (the human is going to act within minutes). There's no real branching (every email goes through the same three steps). Failure recovery is "show the email again, the model is cheap".

We used the Claude Agent SDK. Thirty lines. Shipped in a day. Done.

The trap is reaching for LangGraph because the architecture diagram looks more impressive. It usually isn't. It's usually one prompt, one tool layer, and a UI.

The three-question test

Before starting a project in LangGraph, we ask:

Does this workflow need to survive a restart? If the answer is "yes, it might pause for hours or days", that's a strong LangGraph signal.

Does this workflow have genuinely distinct paths that involve different humans or systems? Branching inside a prompt is fine. Branching that involves "wait for this specific person to click yes" is not.

Will I need to inspect or replay this workflow's execution after the fact? Compliance, audit, debug. If yes, the explicit graph is worth its weight.

Two out of three is a yes. Less than that and we use a lighter framework.

# A LangGraph node, roughly
def check_inventory(state: OrderState) -> OrderState:
    items = state["parsed_order"]["items"]
    availability = inventory.check(items)
    return {
        **state,
        "inventory_check": availability,
        "needs_supplier": any(not a.in_stock for a in availability),
    }

# An edge condition
def route_after_inventory(state: OrderState) -> str:
    if state["needs_supplier"]:
        return "draft_supplier_outreach"
    return "draft_invoice"

That's the shape. Every node is a pure function from state to state. Every edge is a function from state to the next node name. It looks bureaucratic. It is also the reason you can debug a five-step workflow at 2 a.m.

The pieces nobody mentions until later

A few things that matter when you actually try to ship LangGraph in production.

Checkpoint storage choice. The default in-memory checkpointer is fine for development and useless for prod. You need Postgres or Redis. Postgres is easier for inspection and replay. We use it.

State typing discipline. TypedDict or a Pydantic model. Don't use plain dict. The day you change a key name and a six-month-old checkpoint stops loading, you'll thank yourself.

Human-in-the-loop is a separate API call. The pause is real but you have to wire up the surface where a human approves. We've used Slack interactivity, web forms, and WhatsApp reply parsing. Each one is a few hours of work.

Observability matters more. LangGraph traces well to LangSmith, OpenTelemetry, or your own logger. Set this up before you ship. The whole point of the framework is inspectability, and you'll need the traces the first time something goes wrong.

If you can't draw the workflow on a whiteboard in under five minutes, you're not ready to build it in LangGraph. The graph is the contract, not a clever pattern.

The honest comparison

Versus Claude Agent SDK: LangGraph is much heavier and much more powerful. SDK is the right call when the workflow is short, conversational, and lives inside one model. LangGraph is the right call when the workflow is long, structural, and crosses humans.

Versus CrewAI/AutoGen: LangGraph treats the agent as a state machine. CrewAI/AutoGen treats it as a team. Different mental models, different problems. We use LangGraph when the structure matters more than the metaphor.

Versus rolling your own: don't. You'll end up with LangGraph minus the tests, plus your own bugs. If you genuinely don't want a graph framework, you probably want a workflow framework (Temporal, Inngest). That's a different choice with its own tradeoffs.

The takeaway

LangGraph is not a default. It's a tool you pick on purpose, when the workflow has long pauses, real branching, and a need for inspection. When it fits, nothing else is close. When it doesn't fit, lighter tools win.

The studio rule: don't pick LangGraph because it sounds serious. Pick it because the graph is the actual product.

The framework is correct when it makes the workflow easier to reason about, not the other way around.