CUSTOMERS / KASTLE

How Kastle ships compliance-grade voice AI to FDIC-insured banks with Cekura

Our agents are graphs, not prompts. Cekura is how we test each state and then end-to-end. It has become a critical part of our development pipeline, now we don't ship any agents to production without first aggressively testing them out on Cekura.

Nitish Poddar

Kastle CTO

Company

About

Kastle's voice agents handle the full borrower lifecycle: payment collection, right-party contact, identity verification, escrow, payoff, and loss mitigation. They run omnichannel across voice, SMS, email, and chat, and execute workflows directly inside ICE Mortgage Technology's MSP. The deployment results speak for themselves: 70% lower cost-per-call, 40% lower handle time, 90%+ CSAT, and over $100M in cash transactions processed.

Industry

Fintech & Consumer Lending AI

Company size

11–50

The Challenge

Kastle ships voice agents into FDIC-insured banks, where every borrower call is auditable and the wrong disclosure is a regulatory event, not a UX bug. Their agents were already built as graphs of states with state-specific compliance obligations, and three structural problems made it hard to iterate safely with off-the-shelf eval tools.

Graph-based agent design: Each agent was a directed graph of states like Gatekeeper, Verify DOB, Process Payment, Capture Promise-to-Pay, Negotiation, Language Routing, and Payoff. Each state regressed independently when the team iterated.
Conversation tests were too coarse: When a call failed, the team couldn't tell which state broke. Iteration needed a pass-fail signal at the state level, not on a 30-turn transcript.
Compliance lived at the state level: NACHA recitals happens during payment processing. Mini-Miranda happens at the gatekeeper transition. Averaging compliance across a full call hid where each obligation actually lived, and a regulator would not accept that view.

The Solution

Kastle built their testing on Cekura around node-based simulation. Every state has its own dedicated test suite, run in chat for coverage and in voice for realism.

Per-node scenario libraries: Every state in every agent has its own set of evaluators, with each scenario tagged to the node it targets. When a state changes, only that state's tests fire.
Chat first for coverage, then voice end-to-end: Kastle runs the same agents in chat mode first to get broad scenario coverage quickly. Once a node passes in chat, the same scenarios run end-to-end in voice to validate real behavior under audio.
Compliance metrics scoped to the right node: RESPA, NACHA, fair-housing, mini-Miranda, hallucination detection, and customer-specific rules are layered into a unified rubric. Each metric only evaluates at the node where the regulatory obligation lives.
Scenarios seeded from production calls: Real borrower call transcripts are turned into tagged scenarios for the affected node for that customer's deployed agent. The regression library for each customer grows directly from the traffic it is meant to protect.

The Results

Kastle's voice agents now ship with regression coverage that matches the regulator's view of the call.

Compliance graded at the node level: Every release is scored against the full rubric at the state where each obligation lives, before any borrower traffic reaches it.
Quality lifts on every node: Each state is validated against thousands of edge cases before going live.

Coverage and realism, layered: Chat-first testing catches issues cheaply and broadly. Voice end-to-end testing then verifies real behavior under audio. Both modes run on the same scenarios.