How To Build RAG-Powered Voice AI Agents in 2026

Introduction

Modern AI systems need access to dynamic, up-to-date information. Static knowledge bases quickly become stale, and manual updates don't scale. This post explores how to build production-grade knowledge base connectors that automatically sync external data sources, and how to implement agentic retrieval patterns using the Claude Code SDK.

We'll dive into real production architecture from Cekura's voice AI testing platform, examining:

Multi-source knowledge base connector architecture
Async syncing with scheduled refreshes
Agentic retrieval patterns for intelligent context loading
Security considerations (SSRF protection, credential encryption)

Part 1: Knowledge Base Connector Architecture

The Problem

AI agents need fresh knowledge from multiple sources:

Data warehouses (e.g. BigQuery) - Analytics data, customer metrics, historical trends
Websites - Public documentation, blog posts, knowledge bases
File uploads - Internal documents, reports, and reference material in any format

Each source has different:

Authentication mechanisms (service accounts for data warehouses, optional headers for websites)
Data formats (JSON from queries, HTML/Markdown from pages, raw content from uploaded files)
Update frequencies (hourly to daily syncing)
Security requirements (SSRF prevention for websites, credential encryption for all)

Core Architecture

The production schema uses two main models:

KnowledgeBaseFile - Stores individual knowledge base files in S3/Supabase:

Links to AI agents
Accepts files in any format — drop in whatever your team works with
Tracks file metadata (name, type, size, timestamps)

KnowledgeBaseConnector - Manages external data source syncing:

Connects to AI agents
Supports multiple connector types (data warehouse queries, website scraping, and more)
Separates credentials (encrypted) from configuration (public)
Integrates with Celery Beat for automatic scheduling
Tracks sync status (pending, syncing, success, failed)
Uses ManyToMany relationship with files (website scrapers create multiple files per sync)

Key Design Decisions:

Separation of Concerns: Credentials vs. configuration split keeps sensitive data encrypted while allowing public configuration inspection
ManyToMany Files: Website scrapers create one file per page; data warehouse connectors create consolidated files
Celery Beat Integration: Automatic scheduling without custom cron jobs
Status Tracking: Observability for sync failures and debugging

Connector Implementations

Data Warehouse Connector (e.g. BigQuery)

Executes SQL queries and stores results as JSON files. BigQuery, for example, uses Google Cloud service account authentication. Configuration includes the SQL query to execute, with support for parameterized queries. Results are converted to JSON with unique timestamped filenames. The same pattern extends to other data warehouses and query engines your stack relies on.

Website Scraper Connector

The website scraper fetches and converts web content to knowledge base articles:

SSRF protection: blocks private IPs, localhost, link-local, and multicast addresses
Converts HTML to markdown format
Supports crawling multiple pages with pagination

Async Syncing with Celery

Syncing runs asynchronously via Celery tasks to avoid blocking the main application:

Executes connector-specific data fetching (data warehouse queries, website scraping, or other source types)
Stores fetched content in S3 and links to the connector
Tracks sync status with timestamps for observability
Implements automatic retry with exponential backoff on failures

API Endpoints

A REST API provides full CRUD operations for managing connectors, including:

Creating connectors (triggers immediate sync)
Listing and filtering by agent
Manual sync triggering
Updating configuration and credentials

Security features include credential exclusion from list views and duplicate sync prevention.

Part 2: Agentic Retrieval with Claude Code SDK

Now that we have knowledge bases automatically syncing, how do we intelligently retrieve the right context for each agent call?

The Challenge

Consider a customer support bot with:

1,000+ KB articles
50+ product documentation pages
Real-time metrics from a data warehouse
User conversation history

Problem: We can't send all 1,000 articles to Claude on every request (context window limits, latency, cost).

Solution: Agentic retrieval - let Claude decide what knowledge it needs and fetch it dynamically.

Agentic Retrieval Architecture

KB Sync → Download all KB files to local filesystem
                ↓
          Claude Agent SDK runs in sandbox environment
                ↓
          Agent has read access to KB directory
                ↓
          User Request → Claude uses Read tool to access relevant KB files
                              ↓
                          Claude responds with KB context

Knowledge Base Manager

A singleton manager handles KB lifecycle:

Fetches and caches KB files locally from the backend API
Automatically refreshes stale content (24 hour TTL)
Cleans content for optimal context (removes links, formatting noise)
Initializes lazily without blocking the application

Building a Claude Agent with KB Access

The Claude Agent SDK provides built-in filesystem tools that give Claude direct access to the knowledge base:

Read - Read contents of specific KB files

Glob - Find files matching patterns (e.g., **/*.txt to find all text files)

Grep - Search file contents for keywords or topics

Workflow Example: User asks: "How do I set up voice AI testing with custom prompts?"

Behind the scenes, Claude:

Uses Grep to search KB files for keywords: "voice AI testing", "custom prompts"
Identifies relevant files from search results
Uses Read to load full content of matching files
Synthesizes answer using KB content
Returns response: "Based on our Voice AI Setup Guide, here's how to..."

Cekura integrates natively with Retell, VAPI, ElevenLabs, LiveKit, Pipecat, Bland, Synthflow, Cisco, Twilio, Plivo, and SIP — so KB retrieval works out of the box with whichever voice stack you're running.

Security Best Practices

Credential Encryption

All sensitive credentials (API keys, service accounts, tokens) are encrypted at rest using industry-standard encryption and never exposed in logs or API responses.

SSRF Protection

Website scrapers validate URLs to prevent Server-Side Request Forgery attacks by blocking access to private networks, localhost, and internal IP ranges.

Rate Limiting

API endpoints implement throttling to prevent abuse, with different limits for anonymous users, authenticated users, and sync operations.

File Validation

Uploaded files are scanned for safety on ingestion, ensuring that accepting any format doesn't come at the cost of security.

Compliance

Cekura is SOC 2 Type II audited and supports HIPAA-scoped deployments under BAA. GDPR-compliant with DPA available. Reports available under NDA — request via sales.

Performance Optimization

Parallel Downloads

Knowledge base files are fetched concurrently to minimize sync time, with timeout protection and graceful error handling.

Caching

Frequently accessed KB articles are cached in-memory with TTL-based expiration to reduce disk I/O and improve response times.

Retry Logic

Async tasks implement exponential backoff for transient failures while quickly failing on permanent errors to avoid wasted resources.

Sync Status Dashboard

Admin interface provides visibility into connector health, sync history, and failure debugging with filtering and search capabilities.

Metrics & Alerts

The system tracks sync durations, success/failure rates, and automatically alerts on repeated failures for proactive issue resolution.

Agent Tracing

Structured logging captures tool usage, processing times, and errors for debugging and performance optimization.

Conclusion

Building production-grade knowledge base connectors requires:

Flexible architecture supporting multiple data sources and file formats
Async syncing with retry logic and status tracking
Security hardening (SSRF prevention, credential encryption, file validation)
Performance optimization (parallel downloads, caching, efficient storage)
Observability (metrics, alerts, trace logging)

Agentic retrieval with Claude Code SDK enables:

Dynamic context loading - Claude uses filesystem tools to access relevant KB articles
Tool-based KB access - Built-in Read, Glob, and Grep tools for intelligent retrieval
Flexible deployment - Runs in sandbox environments with KB directory access

Next Steps

Try building your own:

Start with a data warehouse connector (BigQuery is a great example) for analytics data
Add website scraping with SSRF protection
Drop in existing files — any format works out of the box
Set up async syncing with Celery
Build a Claude agent with filesystem tools for KB access
Deploy in a sandbox environment for production use

Knowledge Base Connectors and RAG: Agentic Retrieval for Voice AI Agents

Why Trust Cekura on Voice AI Evals

Introduction

Part 1: Knowledge Base Connector Architecture

The Problem

Core Architecture

Connector Implementations

Data Warehouse Connector (e.g. BigQuery)

Website Scraper Connector

Async Syncing with Celery

API Endpoints

Part 2: Agentic Retrieval with Claude Code SDK

The Challenge

Agentic Retrieval Architecture

Knowledge Base Manager

Building a Claude Agent with KB Access

Security Best Practices

Credential Encryption

SSRF Protection

Rate Limiting

File Validation

Compliance

Performance Optimization

Parallel Downloads

Caching

Retry Logic

Sync Status Dashboard

Metrics & Alerts

Agent Tracing

Conclusion

Next Steps

Ready to ship voice
agents fast?

Knowledge Base Connectors and RAG: Agentic Retrieval for Voice AI Agents

Why Trust Cekura on Voice AI Evals

Introduction

Part 1: Knowledge Base Connector Architecture

The Problem

Core Architecture

Connector Implementations

Data Warehouse Connector (e.g. BigQuery)

Website Scraper Connector

Async Syncing with Celery

API Endpoints

Part 2: Agentic Retrieval with Claude Code SDK

The Challenge

Agentic Retrieval Architecture

Knowledge Base Manager

Building a Claude Agent with KB Access

Security Best Practices

Credential Encryption

SSRF Protection

Rate Limiting

File Validation

Compliance

Performance Optimization

Parallel Downloads

Caching

Retry Logic

Sync Status Dashboard

Metrics & Alerts

Agent Tracing

Conclusion

Next Steps

Ready to ship voice agents fast?

Ready to ship voice
agents fast?