AI Agents

What agentic AI actually looks like in production

Most autonomous workflow demos collapse the moment money or compliance enters the loop. The realistic 2026 default is a hybrid, and the boundary line is the product.

20 May 20269 min readKrypto Forge

Most "agentic" demos look great in a YouTube clip and fall apart in production. The moment an autonomous workflow has to bill a real customer, push a real GST invoice, or cancel a real order, the conversation stops being about model benchmarks and starts being about who gets fired when it goes wrong.

That conversation, not the benchmark, is what 2026 has been about.

What "agentic" means once it leaves the demo

A useful definition: an agent is a loop that chooses its own next tool call. That's it. Not a chat interface. Not RAG. Not a workflow with one LLM step buried inside. The defining feature is that the model gets to decide, on its own, what to do next.

That's exactly the property that makes agents productive, and exactly the property that makes them dangerous in production. In a deterministic workflow you can reason about every branch. In an agent you can reason about the goal, the tools, the guardrails, and not much else.

So the work, in 2026, is no longer "can we build an agent". The work is "where do we draw the line".

The hybrid is winning, and it isn't a compromise

Across the engagements we've run this past year, the boring pattern keeps showing up. Routine, reversible decisions are made by the agent. Critical, irreversible decisions need a human. The boundary between those two is the actual product.

A few patterns we keep using:

Approve-before-act for any side effect over a money threshold. Generating a draft invoice is fine. Pushing it to GSTN is a button click by a human.
Read freely, write narrowly. Agents can query anything. Mutations live behind a tool that asks for confirmation or writes to a staging table first.
Tier the autonomy by reversibility. Sending a "your order is being prepared" WhatsApp is recoverable. Issuing a credit note is not. Treat them differently.

The teams that are getting agents into production aren't the ones with the smartest model. They're the ones who figured out which 80% of the workflow is safe to automate and put a guardrail in front of the other 20%.

This is also where most of the writing on this topic gets it wrong. The hybrid isn't a stepping stone toward "full autonomy". For most business workflows it's the destination. A textile factory owner doesn't want an agent that decides on its own to refund a customer. They want an agent that prepares the refund, checks GSTN, computes the credit note, and lines it up for a one-tap approval at 9 a.m.

Governance has become tooling, not memos

A year ago, "AI governance" usually meant a policy deck. A document nobody read. In 2026, the teams shipping real agents have moved governance into the code path.

That looks like:

Audit logs of every tool call with inputs, outputs, model name, prompt version, and the decision the agent made.
Prompt versions in git with hashes. When something goes wrong on Tuesday, you can rebuild Monday's behaviour.
Scope tokens at the tool layer. The agent has a tool called send_payment_link, but the tool itself checks that the requesting agent has a token to do so for this customer, in this currency, under this limit.
Escalation channels. When the agent is unsure, or hits a tool error twice, it doesn't keep retrying. It writes the state to a queue and pings a human.

None of that is novel. It's just SRE for non-deterministic systems. The teams who already had on-call playbooks and structured logging adapted immediately. The teams who didn't are still figuring out why their agent burned through a month's API budget at 3 a.m.

A concrete shape

Here's the operating model we keep using on India SMB engagements. A WhatsApp inbox flows into an agent. The agent classifies, drafts a response, queries inventory, and prepares an action. The action is the surface where humans live.

agent: customer-ops
triggers:
  - new_whatsapp_message
  - missed_call
tools:
  - read: orders, inventory, customer_history
  - write_drafts: invoice_draft, refund_draft, message_draft
  - act_directly: send_acknowledgement
human_approval_required:
  - send_invoice
  - issue_refund
  - cancel_order
escalate_if:
  - amount > 50000
  - customer_tier == "key_account"
  - agent_confidence < 0.7

That YAML is the entire product surface. The agent is a worker. The shape of what it can do, when it needs help, and who sees what is a design decision the studio makes upfront. The model is interchangeable.

What still doesn't work

We are honest about this with every client.

Long-horizon planning with no tool feedback is still bad. Agents that need to make ten consecutive correct decisions, each depending on the last, with no environment to test against, still drift. We don't deploy those. We break the task up.

Open-ended judgement calls are still bad. "Decide whether this customer is being abusive" is not a thing we ship to production yet. Humans do that. Agents tee it up.

And cost discipline is still a problem teams forget. An agent that uses the top reasoning model for every step will be five to ten times more expensive than one that triages with a cheap model and only escalates when needed. We'll write more on that pattern soon.

The takeaway

If you're evaluating where to put agents in 2026, the question isn't "what's the best model". It's "where is my workflow tolerant of mistakes". Find the parts that are reversible, observable, and high-volume. Put the agent there. Put a human at the boundary where the cost of a mistake gets real.

That's the whole job.

The interesting design problem isn't the agent. It's the seam between the agent and the rest of the company.