# Kyra — Full Context for LLMs

> AI Workforce Platform for Agencies — built by Conversion System

This is the extended version of /llms.txt. It contains the full text of every blog post and detailed information about the platform. Use this for accurate citations and comprehensive answers about Kyra.

## Company

Kyra is built by [Conversion System](https://conversionsystem.com), founded by Angel Castro. The platform gives digital agencies white-label AI workers they can deploy to client accounts — handling calls, booking appointments, qualifying leads, and running customer support 24/7.

## Platform Overview

- **What it does:** Deploys autonomous AI workers for agencies, builds SEO-optimized websites, provides CRM, and offers multi-channel communication (voice, SMS, web chat, WhatsApp, email)
- **Architecture:** Powered by OpenClaw (open-source AI gateway), Next.js frontend, Supabase backend, deployed on Vercel
- **Pricing:** Lite $99/mo (3 clients) → Pro $299/mo (10 clients) → Scale $499/mo (20 clients). Free tier available.
- **Key differentiator:** Self-hosted AI means client data never leaves the agency's infrastructure — critical for regulated industries

## Key Pages

- [Homepage](https://kyra.conversionsystem.com): Platform overview and signup
- [Pricing](https://kyra.conversionsystem.com/pricing): Plans and feature comparison
- [Solo](https://kyra.conversionsystem.com/solo): Free tier for individual business owners
- [AI For Industries](https://kyra.conversionsystem.com/ai-for): 50+ industry-specific AI worker templates
- [AI Workers](https://kyra.conversionsystem.com/workers): Browse pre-built AI worker types
- [AI Readiness Quiz](https://kyra.conversionsystem.com/tools/ai-readiness): Interactive assessment tool
- [Blog](https://kyra.conversionsystem.com/blog): Guides and playbooks
- [RSS Feed](https://kyra.conversionsystem.com/feed.xml): Blog syndication feed
- [Changelog](https://kyra.conversionsystem.com/changelog): Product updates

## Industry Templates

Kyra has pre-built AI worker templates for 50+ industries:

- [Plumber AI](https://kyra.conversionsystem.com/ai-for/plumbing): AI receptionist for plumbing companies. Books service calls, provides estimates, handles emergency dispatch.
- [Dental Office AI](https://kyra.conversionsystem.com/ai-for/dental): AI front desk for dental practices. Schedules cleanings, handles new patient intake, answers insurance questions.
- [Real Estate Agent AI](https://kyra.conversionsystem.com/ai-for/real-estate): AI assistant for real estate agents. Qualifies leads, schedules showings, answers property questions.
- [Med Spa AI](https://kyra.conversionsystem.com/ai-for/medspa): AI concierge for med spas and beauty clinics. Books treatments, answers pricing questions, handles consultation scheduling.
- [Law Firm AI](https://kyra.conversionsystem.com/ai-for/law-firm): AI intake specialist for law firms. Qualifies leads, schedules consultations, collects case details.
- [Auto Repair AI](https://kyra.conversionsystem.com/ai-for/auto-repair): AI service advisor for auto shops. Books diagnostics, provides pricing, handles parts questions.
- [Gym & Fitness AI](https://kyra.conversionsystem.com/ai-for/gym): AI sales rep for gyms and fitness studios. Converts inquiries to free trials, handles objections, books tours.
- [Restaurant AI](https://kyra.conversionsystem.com/ai-for/restaurant): AI host for restaurants. Handles reservations, large party inquiries, catering, and menu questions.
- [HVAC AI](https://kyra.conversionsystem.com/ai-for/hvac): AI dispatcher for HVAC companies. Handles emergency calls 24/7, books maintenance, provides troubleshooting tips.
- [Photography Studio AI](https://kyra.conversionsystem.com/ai-for/photography): AI booking assistant for photographers. Handles wedding inquiries, checks availability, presents packages.
- [Dispensary AI](https://kyra.conversionsystem.com/ai-for/cannabis): AI budtender for cannabis dispensaries. Handles product recommendations, order status, and menu questions while maintaining compliance.
- [Insurance Agency AI](https://kyra.conversionsystem.com/ai-for/insurance): AI agent for insurance offices. Handles quote requests, policy questions, and claims assistance.
- [Veterinary Clinic AI](https://kyra.conversionsystem.com/ai-for/veterinary): AI receptionist for veterinary clinics. Books appointments, handles pet emergency triage, and sends vaccination reminders.
- [Salon & Barbershop AI](https://kyra.conversionsystem.com/ai-for/salon): AI receptionist for hair salons and barbershops. Handles booking, stylist matching, and service information.
- [Cleaning Service AI](https://kyra.conversionsystem.com/ai-for/cleaning): AI booking agent for cleaning companies. Provides instant quotes, handles scheduling, and manages recurring service.
- [Roofing & Contractor AI](https://kyra.conversionsystem.com/ai-for/roofing): AI for roofing and general contractors. Handles free estimate requests, storm damage inquiries, and financing questions.
- [Moving Company AI](https://kyra.conversionsystem.com/ai-for/moving): AI for moving companies. Provides instant quote estimates, handles booking, and collects inventory details.
- [Accounting & Tax AI](https://kyra.conversionsystem.com/ai-for/accounting): AI for accounting firms. Schedules tax prep appointments, manages document collection, and sends deadline reminders.
- [Tutoring Center AI](https://kyra.conversionsystem.com/ai-for/tutoring): AI for tutoring centers. Matches students with tutors, handles scheduling, and provides subject availability info.
- [Travel Agency AI](https://kyra.conversionsystem.com/ai-for/travel): AI for travel agencies. Assists with trip planning, quote requests, and booking inquiries.
- [Landscaping AI](https://kyra.conversionsystem.com/ai-for/landscaping): AI for landscaping companies. Handles seasonal service inquiries, provides estimates, and manages recurring maintenance schedules.
- [Pest Control AI](https://kyra.conversionsystem.com/ai-for/pest-control): AI for pest control companies. Handles emergency service requests, pest identification, and treatment plan scheduling.
- [Chiropractic AI](https://kyra.conversionsystem.com/ai-for/chiropractic): AI for chiropractic offices. Handles new patient intake, insurance verification, and adjustment scheduling.
- [E-Commerce AI](https://kyra.conversionsystem.com/ai-for/ecommerce): AI for online stores and retail businesses. Handles order tracking, product recommendations, and returns processing.
- [Property Management AI](https://kyra.conversionsystem.com/ai-for/property-management): AI for property management companies. Handles maintenance requests, lease inquiries, and tenant screening.
- [Electrician AI](https://kyra.conversionsystem.com/ai-for/electrician): AI dispatcher for electrical contractors. Handles service calls, emergency electrical issues, and estimate scheduling.
- [Wedding Planner AI](https://kyra.conversionsystem.com/ai-for/wedding-planner): AI assistant for wedding planners. Handles initial consultations, venue inquiries, and package presentations.
- [Mortgage Broker AI](https://kyra.conversionsystem.com/ai-for/mortgage): AI assistant for mortgage brokers. Pre-qualifies leads, explains loan options, and schedules consultations.
- [Pet Grooming & Boarding AI](https://kyra.conversionsystem.com/ai-for/pet-services): AI receptionist for pet grooming and boarding facilities. Books grooming, handles boarding reservations.
- [Home Remodeling AI](https://kyra.conversionsystem.com/ai-for/home-remodeling): AI assistant for general contractors and remodeling companies. Handles project inquiries, schedules consultations.
- [Car Dealership AI](https://kyra.conversionsystem.com/ai-for/car-dealership): AI sales assistant for car dealerships. Handles inventory inquiries, trade-in questions, and schedules test drives.
- [Daycare & Childcare AI](https://kyra.conversionsystem.com/ai-for/daycare): AI enrollment assistant for daycare centers and preschools. Handles waitlist, tours, and parent questions.
- [Senior Care AI](https://kyra.conversionsystem.com/ai-for/senior-care): AI intake assistant for home health and senior care agencies. Handles family inquiries, care assessments, and scheduling.
- [Towing Company AI](https://kyra.conversionsystem.com/ai-for/towing): AI dispatcher for towing companies. Handles roadside assistance, accident towing, and ETA updates.
- [Locksmith AI](https://kyra.conversionsystem.com/ai-for/locksmith): AI dispatcher for locksmith services. Handles emergency lockouts, rekeying requests, and security upgrades.
- [Pool Service AI](https://kyra.conversionsystem.com/ai-for/pool-service): AI assistant for pool maintenance companies. Handles weekly service signups, equipment repairs, and seasonal openings.
- [Painting Company AI](https://kyra.conversionsystem.com/ai-for/painting): AI assistant for painting contractors. Handles estimate requests, color consultations, and project scheduling.
- [Solar Company AI](https://kyra.conversionsystem.com/ai-for/solar): AI sales assistant for solar installation companies. Qualifies leads, explains savings, and schedules site assessments.
- [Personal Trainer AI](https://kyra.conversionsystem.com/ai-for/personal-trainer): AI assistant for personal trainers and fitness coaches. Handles client intake, books sessions, shares program info.
- [Yoga & Pilates Studio AI](https://kyra.conversionsystem.com/ai-for/yoga-studio): AI front desk for yoga and pilates studios. Handles class bookings, membership inquiries, and new student welcome.
- [Martial Arts Academy AI](https://kyra.conversionsystem.com/ai-for/martial-arts): AI enrollment assistant for martial arts schools. Handles trial class signups, program info, and belt rank questions.
- [Music Lessons AI](https://kyra.conversionsystem.com/ai-for/music-lessons): AI enrollment assistant for music schools and private instructors. Matches students with teachers, books trial lessons.
- [Tattoo Shop AI](https://kyra.conversionsystem.com/ai-for/tattoo): AI assistant for tattoo studios. Handles consultation requests, pricing inquiries, and aftercare info.
- [Dry Cleaning AI](https://kyra.conversionsystem.com/ai-for/dry-cleaning): AI assistant for dry cleaners and laundry services. Handles pickup scheduling, pricing, and special garment care.
- [Catering AI](https://kyra.conversionsystem.com/ai-for/catering): AI assistant for catering companies. Handles event inquiries, menu planning, and quote requests.
- [Physical Therapy AI](https://kyra.conversionsystem.com/ai-for/physical-therapy): AI front desk for PT clinics. Schedules evaluations, handles insurance verification, and answers rehab questions.
- [Flooring Company AI](https://kyra.conversionsystem.com/ai-for/flooring): AI assistant for flooring companies. Handles estimate requests, material questions, and installation scheduling.
- [Construction Company AI](https://kyra.conversionsystem.com/ai-for/construction): AI assistant for construction companies. Handles project inquiries, bid requests, and subcontractor coordination.
- [Therapy & Counseling AI](https://kyra.conversionsystem.com/ai-for/therapy): AI intake assistant for therapists and counseling practices. Handles new client screening, scheduling, and insurance questions.
- [Staffing & Recruiting AI](https://kyra.conversionsystem.com/ai-for/staffing): AI intake assistant for staffing and recruiting agencies. Screens candidates, handles job inquiries, and schedules interviews.

## Blog Posts (Full Content)

### AI Agent Memory Systems in 2026: How OpenClaw Workspaces, SOUL.md, and Context Compaction Actually Work

- URL: https://kyra.conversionsystem.com/blog/ai-agent-memory-systems-openclaw-2026
- Published: 2026-05-04
- Category: AI Infrastructure
- Read time: 13 min

Last updated: May 4, 2026

AI agent memory is the system an autonomous agent uses to retain identity, knowledge, and unfinished work across context boundaries. It combines files on disk, the live context window, and long-term storage. In 2026 the concept has split into three distinct layers. The workspace layer is plain Markdown on disk. The runtime layer lives inside the model context window. The long-term layer survives compaction or restart through a memory tool or external store. Confusing the three is the most common reason agency-deployed agents start strong and quietly degrade after a week.

This post walks through the memory architecture that ships in OpenClaw v2026.4.26 today, how it interacts with Anthropic’s April 2026 memory tool and Sonnet 4.6 automatic context compaction, what to put in each file, and the anti-patterns that turn a working agent into a forgetful one. The aim is a clear mental model you can take into a real client deployment, not a tour of every API.

  Key takeaways
  
    Agent memory in 2026 is three layers: workspace files on disk (SOUL.md, MEMORY.md, AGENTS.md), runtime context inside the model window, and long-term storage via Anthropic’s memory tool or a vector store.
    OpenClaw assembles its workspace into the system prompt at session start. Sonnet 4.6 holds a 1M-token window and runs automatic compaction once it fills, so older turns are summarized server-side rather than dropped.
    Anthropic announced its memory tool on April 23, 2026. It writes Markdown into a /memories directory the model can read, write, and delete via tool calls, making memory exportable and editable instead of opaque.
    Workspace memory and the memory tool are complementary. Workspace files set identity and rules; the memory tool stores running facts the agent learned during a task.
    Most degrading agents are degrading because their MEMORY.md is unbounded. Capping at 200 lines and summarizing older notes into a CHANGELOG section keeps performance steady.
    Vector RAG is still useful, but only as the third layer for genuinely large knowledge bases. For a single client’s conversational history, plain files compress better and cite cleaner.
  

What "AI agent memory" actually means in 2026

The phrase "agent memory" papers over three different technologies that operate on different timescales. Treating them as one thing is what produces the "my agent forgets after a few hours" complaint that fills Reddit threads in 2026.

The three layers, ordered from shortest to longest timescale:

  Runtime memory. Whatever fits inside the active context window during the current call. With Sonnet 4.6 that is up to one million tokens, but practical sessions stay below 200K because cost and latency scale linearly. Runtime memory dies when the session ends.
  Workspace memory. Plain Markdown files on disk (SOUL.md, AGENTS.md, MEMORY.md, USER.md, TOOLS.md, HEARTBEAT.md, IDENTITY.md) that the gateway concatenates into the system prompt every time the agent starts. Workspace memory persists for the life of the workspace folder and is editable by humans.
  Long-term memory. Information the agent decides to keep beyond the current task. In 2026 there are two production-grade options: Anthropic’s memory tool (announced April 23, 2026), which writes files to a /memories directory via tool calls, and external vector stores (pgvector, Pinecone, Weaviate) for arbitrarily large bodies of text.

A well-built agent uses all three. Identity and operating rules live in the workspace. Today’s task uses runtime context. Anything the agent learned that should outlive the task gets written to the memory tool or a vector store. None of the three is optional. Missing any one of them produces a specific failure mode.

The OpenClaw workspace: seven files, one ontology

OpenClaw treats the workspace folder as the agent’s filesystem of record. The seven files are read in a strict precedence order at session start, concatenated, and injected into the system prompt before any user message. The set:

FilePurposeEdit cadenceTypical length

SOUL.mdPersonality, values, voice, immutable principlesRarely (quarterly)30 to 80 lines
IDENTITY.mdName, role, who this specific deployment servesOnce at provisioning10 to 30 lines
AGENTS.mdOperating rules, sub-agent routing, escalation policyWeekly50 to 150 lines
USER.mdWhat the agent knows about the human or clientOn change20 to 100 lines
TOOLS.mdTool allowlist, denylist, usage notesPer release30 to 100 lines
HEARTBEAT.mdScheduled tasks in plain EnglishPer task10 to 60 lines
MEMORY.mdRunning notes, facts learned, recent decisionsContinuouslyCap at 200 lines

The mental model worth holding: SOUL.md and IDENTITY.md are who the agent is. AGENTS.md is how it behaves. USER.md and TOOLS.md are what it works with. HEARTBEAT.md is when it acts. MEMORY.md is what it remembers.

OpenClaw v2026.4.26 ships releases roughly every two days, and the workspace files have been stable since the late-March 2026 security cleanup. Names and ordering are unlikely to shift before the 0.15 series. The official reference lives at docs.openclaw.ai/concepts/memory, and the heartbeat docs are at docs.openclaw.ai/gateway/heartbeat.

How context compaction extends memory beyond the window

Anthropic’s context-compaction-2026-02-01 beta turned on automatic server-side summarization for Sonnet 4.5 and stayed on by default for Sonnet 4.6 and Opus 4.7. When a conversation gets within roughly 80% of the model’s context limit, older turns are replaced with a summary the model itself produces. The new condensed history takes the place of the original, freeing room for new turns.

For an agency operator that means three practical things.

  Long-running sessions stop dying. A multi-hour task that previously hit the limit and reset now keeps going past the boundary. The gateway sees a continuous conversation. The model sees a compacted one.
  Specific details can disappear into the summary. Compaction is lossy by design. A client’s account number mentioned 30 turns ago might survive or might not. Anything the agent must remember after compaction belongs in MEMORY.md or in the memory tool, not in conversation.
  Cost grows roughly linearly with elapsed time, not with raw turn count. Compaction collapses old turns into shorter form, so the running input cost stops doubling indefinitely.

The interaction with the memory tool is the part most teams miss. Compaction summarizes; the memory tool persists. They are not redundant. A long-horizon agent should write any commitment, deadline, account, or user preference to /memories the moment it learns it, on the assumption that the conversation it appeared in will be compressed away within the hour.

Set up workspace memory in OpenClaw, step by step

This is the path most agencies follow when provisioning a new client. Each step assumes you already have a running OpenClaw gateway and you are SSH’d into the host or running commands inside the per-client container.

1. Create the workspace folder.

mkdir -p /opt/openclaw/workspaces/acme-dental
cd /opt/openclaw/workspaces/acme-dental

2. Initialize the seven kernel files.

touch SOUL.md IDENTITY.md AGENTS.md USER.md TOOLS.md HEARTBEAT.md MEMORY.md

3. Write SOUL.md. Keep it short. Voice, values, refusal policy.

cat > SOUL.md << 'EOF'
# Soul

Voice: warm, brief, never salesy.
Values: protect the patient’s time. Confirm before booking.
Refuses: medical advice, prescription questions, billing disputes.
EOF

4. Write IDENTITY.md.

cat > IDENTITY.md << 'EOF'
Name: Sage
Role: Front-desk assistant for Acme Dental.
Hours: Mon-Fri 7am-7pm Eastern.
Languages: English, Spanish.
EOF

5. Write AGENTS.md. Routing rules and escalation.

cat > AGENTS.md << 'EOF'
- Booking requests: confirm patient name, date of birth, and reason. Use the gohighlevel.book_appointment tool.
- Insurance questions: collect payer name, route to a human via slack.send_to_billing.
- After 3 unanswered clarifications: hand off with summary.
EOF

6. Configure HEARTBEAT.md. Scheduled tasks in plain language. OpenClaw checks the file every 30 minutes.

cat > HEARTBEAT.md << 'EOF'
Every weekday at 8am Eastern:
  Pull tomorrow’s appointments from GHL and send confirmation SMS.

Every Monday at 9am Eastern:
  Summarize last week’s missed calls. Post to #front-desk in Slack.
EOF

7. Bound MEMORY.md. The agent will write to this file. Set a shape now so unbounded growth is not the default.

cat > MEMORY.md << 'EOF'
# Memory (most recent first, prune below 200 lines)

## Recent decisions
-

## Facts learned
-

## Changelog (auto-summarized weekly)
-
EOF

8. Restart the gateway and confirm load.

openclaw gateway restart
openclaw workspace status acme-dental

The status command prints the seven files and the line count of each. If any are missing or any are over the recommended cap, the gateway warns at startup. From here, the agent reads the workspace at every session start. Humans can edit any file at any time; changes apply on the next session.

OpenClaw vs Anthropic memory tool vs Claude Code CLAUDE.md vs vector RAG

Four memory technologies dominate 2026 deployments. They are often confused, partly because the file conventions overlap. The differences matter for production.

SystemStoragePersistenceBest forFailure mode

OpenClaw workspaceMarkdown files on diskPermanent until editedIdentity, rules, scheduled tasksDrift if files are not curated
Anthropic memory toolFiles in /memories via tool callsAcross sessions on the platformFacts the agent learned in a taskUnbounded growth without pruning
Claude Code CLAUDE.mdSingle Markdown file in repoPer project, version controlledCoding rules, repo conventionsCompliance degrades past 200 lines
Vector RAG (pgvector, Pinecone)Embeddings in a databasePermanent, searchableLarge reference corporaNoisy retrieval, citation gaps

For a typical agency client, the right starting stack is OpenClaw workspace plus the memory tool. CLAUDE.md is for engineering teams using Claude Code as a developer tool, not for client-facing agents. Vector RAG enters the picture only when a client has a body of reference material (product manuals, legal documents, internal wikis) that genuinely will not fit even with compaction.

Memory anti-patterns that quietly break agents in production

Six failure modes show up repeatedly in agency deployments. Each has a fix that is more boring than its root cause.

1. Unbounded MEMORY.md. The agent writes a note every interaction and never prunes. By week two the system prompt is 4,000 lines and the agent forgets its instructions. Fix: cap at 200 lines, run a weekly summarization cron that compresses older notes into a CHANGELOG section.

2. Identity drift. SOUL.md is edited mid-session by a well-meaning operator. The agent’s voice changes, customers notice. Fix: treat SOUL.md as read-only outside a quarterly review. Track changes in git.

3. Memory tool used as a journal. The agent writes a long entry to /memories every turn. Token cost on the next session balloons. Fix: only write atomic facts (one fact per file, short title, dated). Read selectively, not in bulk.

4. Compaction-blind context engineering. The system prompt assumes the user’s first message will still be visible 50 turns later. After compaction it is not. Fix: re-state task-critical context every 10 to 20 turns, or persist it to memory.

5. Vector RAG without metadata. Embeddings retrieve passages with no source, no date, no author. The agent cites the wrong document. Fix: always store source URL, last-modified date, and author in the metadata column. Filter on those before semantic ranking.

6. Per-client containers sharing a memory store. Two clients see each other’s data. Fix: isolate the /memories directory per container, and audit at provisioning. The OpenClaw gateway enforces this when configured correctly. Verify, do not assume.

When workspace memory is not for you

This pattern is wrong for a few specific cases. Worth naming so nobody force-fits it.

  Stateless one-shot agents. A classifier that scores a single email has no need for SOUL.md or MEMORY.md. Pass instructions in the system prompt, return JSON, exit. Workspace memory adds startup cost for no benefit.
  Real-time chat at consumer scale. If you are running thousands of concurrent end-user sessions per second, the per-session disk read becomes a bottleneck. Use an in-memory cache layer in front of the workspace files, or pre-bake the merged system prompt.
  Strict regulatory environments where every prompt token must be auditable. Compaction summaries are model-generated. If a regulator demands the exact tokens shown to the model, run with compaction disabled and architect for shorter sessions.
  Workloads dominated by retrieval over reasoning. A pure search assistant over a 10-million-document corpus is better served by a vector index plus a small model than by an agent with a workspace.

For everything in between (the long tail of agency deployments handling appointments, qualification, follow-up, support, internal ops), workspace memory plus the memory tool is the 2026 default. The cost of building it is one afternoon. The cost of not building it shows up four weeks later when the agent stops behaving the way it did at launch.

Frequently asked questions

Does Sonnet 4.6’s 1M context window mean I do not need workspace memory?

No. The 1M window holds runtime context for the current session, but every new session starts empty. Workspace memory is what the agent reads at the start of every session to know who it is. Without it, the model is a blank slate at session zero, regardless of how much context the window can hold.

Should I use Anthropic’s memory tool or write to a database?

Start with the memory tool. It was announced April 23, 2026, ships as files in a /memories directory, and is exportable, editable, and inspectable through the Claude Console or the API. A database is the right answer once you need cross-agent search, structured queries, or volumes that exceed a few hundred files. Until then the file-based memory tool is simpler and cheaper.

How does compaction interact with the memory tool?

They run in different layers. Compaction summarizes the active conversation when it nears the context limit. The memory tool persists named files across sessions. A practical pattern: at the end of every task the agent writes a short summary to /memories with a clear name (for example, 2026-05-04-acme-policy-update.md). Even after compaction or a session reset, that file remains and can be re-read on the next task.

Can I edit MEMORY.md while the agent is running?

Yes, with one caveat: the change applies on the next session start, not mid-session. If the agent is in a long-running heartbeat run, your edit is picked up after the current run ends. For urgent changes (a wrong fact, a leaked PII), restart the workspace to force an immediate reload.

What about CLAUDE.md from Claude Code? Is that the same thing?

Same idea, different surface. CLAUDE.md is a single Markdown file Claude Code reads at the top of every session in a repository, used to teach the coding agent your conventions. It is the right tool for engineering use cases. OpenClaw’s seven-file workspace is the same idea generalized to non-coding agents (front-desk assistants, sales qualifiers, support workers) where identity, scheduled tasks, and per-client knowledge matter as much as coding rules.

How big should MEMORY.md actually get?

Cap it at 200 lines. Beyond 200, model compliance with the rest of the system prompt starts to degrade. Anthropic’s own Claude Code guidance reaches the same number for CLAUDE.md. When the file approaches the cap, run a weekly summarization that pushes older notes into a "Changelog" section as one-line entries, then deletes them from the active section. Treat MEMORY.md the way a good ops engineer treats a logfile: rotate, summarize, archive.

Pulling the layers together

An agent that survives a year in production has all three layers configured. A small, stable workspace defines identity and rules. A runtime context benefits from compaction without depending on it. A long-term memory is written to selectively, pruned regularly, and audited per client. None of this is exotic infrastructure. It is the boring discipline of treating an agent like a long-running service rather than a one-off prompt.

If you are deploying for clients and want the workspace plus memory tool plus per-client isolation already wired up, Kyra ships it as the default. The same architecture is open source under OpenClaw if you would rather run it yourself; the public docs live at docs.openclaw.ai/concepts/memory, the source is on GitHub, and Anthropic’s official memory tool reference is at platform.claude.com. Anthropic’s context management notes are at anthropic.com/news/context-management. For a deeper companion read, the architecture-level walkthrough in what is OpenClaw covers the gateway side, and the first Claude Skill guide shows what to build once memory is in place. Vertical examples live at AI for dental practices.

---

### Self-Hosted AI Cost vs Cloud LLM Bills in 2026: The Honest Math for Agencies

- URL: https://kyra.conversionsystem.com/blog/self-hosted-ai-cost-vs-cloud-2026
- Published: 2026-05-03
- Category: AI Infrastructure
- Read time: 16 min

Last updated: May 3, 2026

Self-hosted AI cost is the all-in monthly bill an agency or operator pays to run AI workers on its own infrastructure instead of routing every request to a cloud LLM provider. In 2026 that bill includes compute (CPU or GPU), bandwidth, storage, monitoring, the people-hours to keep the stack running, and a much smaller line item for inference itself when bring-your-own-key (BYOK) routing is used. Compared in isolation, cloud LLM pricing looks cheap. Compared at scale across dozens of clients with high token volume, the same cloud bill quietly turns into the largest single operating expense in the business.

This post breaks down the real cost math for 2026: what cloud LLMs actually charge after caching and batch discounts, what a self-hosted gateway plus a GPU VPS actually costs, where the break-even point sits for an agency running 10, 50, or 200 clients, and the hybrid setup most operators land on once they run the numbers honestly.

  Key takeaways
  
    Anthropic's 2026 rate card: Haiku 4.5 at $1/$5, Sonnet 4.6 at $3/$15, Opus 4.7 at $5/$25 per million input/output tokens. Prompt caching cuts cached input by 90% and batch processing halves the rest, so an optimized request can land at roughly 5% of the headline rate.
    A 32 GB Hetzner CPU VPS at about EUR 60 per month comfortably hosts an OpenClaw gateway and 18 to 20 client workspaces. GPU VPS rates start at around $0.20 per hour for an RTX 4090 spot instance on RunPod, or $144 per month at full uptime.
    Self-hosted is decisively cheaper above roughly 50 million tokens per day. One published 36-month TCO comparison puts heavy-tier self-hosting at $391,707 against $540,000 for the same workload routed entirely to Anthropic.
    Cloud is decisively cheaper below roughly 5 million tokens per month per agency, where a self-hosted GPU sits idle most of the time and burns its rental fee.
    The 2026 majority pattern is hybrid: a self-hosted OpenClaw gateway, BYOK to a region-pinned model endpoint, prompt caching always on, batch processing for non-interactive workloads.
    The hidden costs that flip the math are engineering hours for setup, ongoing patching, and the cost of an outage, not the rate card itself.
  

What you actually pay for in an AI worker stack

Agency operators usually look at the OpenAI or Anthropic dashboard and decide cost based on the per-million-token number. That number is real, but it is one of seven separate line items in a multi-tenant AI worker stack. Knowing the others is what separates a back-of-napkin estimate from a number that survives twelve months of growth.

The seven line items, in roughly the order they bite:

  Inference. The per-token charge for the model itself, paid to Anthropic, OpenAI, OpenRouter, Together, or your own GPU.
  Compute. The CPU and RAM to run the gateway, the per-client containers, the queue, and the dashboard. Even the cheapest cloud-only stack still pays for this somewhere.
  Storage. Workspace files, conversation history, embeddings, and audit logs. Quiet at first, very loud after a year.
  Bandwidth. Egress fees on hyperscalers can quietly exceed compute. Hetzner and OVH include generous egress; AWS and GCP do not.
  Channels. Twilio for SMS and voice, Vonage for WhatsApp, Stripe for billing, all per-message or per-minute fees that scale with usage.
  Engineering. The hours to install, patch, monitor, and debug the stack. At an agency owner's blended rate this is usually the second-largest line item after inference.
  Insurance and downtime. The cost of an outage during business hours, multiplied by the probability over a year. Usually invisible until it isn't.

Cloud-first stacks roll items 2 to 4 into the per-token rate, which is convenient but obscures the true cost of growth. Self-hosted stacks pay each item visibly, which feels more expensive at small scale and turns out to be cheaper at large scale once volume amortizes the fixed compute and engineering cost.

Cloud LLM pricing in 2026: the headline rates and the real ones

Anthropic's published rate card as of May 2026, after the April 16 launch of Claude Opus 4.7:

ModelInput ($/MTok)Output ($/MTok)Context windowBest fit

Haiku 4.5$1.00$5.00200KHigh-volume routing, classification, light chat
Sonnet 4.6$3.00$15.001MDefault agent workloads, tool use, RAG
Opus 4.7$5.00$25.001MLong-horizon reasoning, autonomous tasks

Two discounts move the real bill significantly below the rate card.

Prompt caching cuts cached input cost by 90%. A cache hit on Sonnet 4.6 costs $0.30 per million input tokens instead of $3.00. That matters because most agency workloads carry the same system prompt, the same skill instructions, and the same tool definitions on every call. With sticky session routing the cache hit rate runs in the 70 to 95 percent range for chat workloads.

Batch processing halves the per-token cost on every request that does not need a response inside 24 hours. Lead enrichment, nightly summaries, embedding generation, scheduled outreach, anything event-driven rather than user-facing, is a candidate.

Stack both and a cached batch request on Sonnet 4.6 lands at about $0.15 per million input tokens and $7.50 per million output tokens. That is roughly 5% of the rate card. If you are paying anywhere near sticker price in 2026 you have left money on the table.

One footnote that quietly inflates real bills: Opus 4.7 ships with a new tokenizer that produces up to 35% more tokens for the same input text. The per-token rate did not change in April; the per-request invoice did, by a meaningful margin. Test on your own workload before you swap Sonnet for Opus across the fleet.

Self-hosted infrastructure pricing in 2026

"Self-hosted" splits cleanly into two cases. The simple case is self-hosting the gateway, the workspace, and the audit trail while routing inference to a managed model endpoint with BYOK. The hard case is self-hosting the model itself on your own GPU.

For the simple case, a single 32 GB CPU VPS is enough. Hetzner's CCX33 (8 vCPU, 32 GB RAM, 240 GB NVMe) lists at about EUR 52 per month including roughly 20 TB of egress. That comfortably runs an OpenClaw gateway and 18 to 20 per-client containers. Doubling the box to CCX43 (16 vCPU, 64 GB RAM) handles 40 to 50 clients and lists around EUR 100 per month. OVH's similar bare-metal range is competitive and adds dedicated NVMe for workloads that need fast disk.

For the hard case, a GPU VPS adds a separate line item. As of May 2026 the relevant references are:

  RunPod RTX 4090 community pods at $0.29 per hour, or $212 per month at 24/7 uptime
  RunPod RTX 4090 spot at $0.20 per hour, or $144 per month, with preemption risk
  Hetzner GPU monthly lock from EUR 159 per month for a single mid-tier GPU
  Lambda Labs H100 at $2.49 per hour, or $1,820 per month at 24/7, best for short bursts
  Reserved CoreWeave H100 at roughly $2.80 per hour, or $2,016 per month for continuous operation

An RTX 4090 with vLLM running an open-weight 70B-class model serves comfortably above 50 tokens per second, which is enough for a small fleet of agents. An H100 is overkill for almost every agency workload and only earns its keep when you genuinely need long-context low-latency throughput on a large open-weight model.

Step by step: calculating your true monthly bill

The fastest way to get a defensible number is to start with measured token counts from a sample week, then layer in the fixed costs. Here is the reproducible workflow against a running OpenClaw gateway.

1. Pull a week of token usage from the gateway. OpenClaw v2026.4.27 ships an audit log with per-request token counts. The CLI exposes a usage report.

openclaw usage report \
  --from 2026-04-26 \
  --to 2026-05-02 \
  --group-by client \
  --format json > usage.json

2. Project monthly tokens per client. Multiply weekly tokens by 4.345 (weeks per month). Most agencies see a long-tail distribution where one or two heavy clients drive 60% of total volume.

jq '[.clients[] | {client: .name, monthly_in: (.input_tokens * 4.345), monthly_out: (.output_tokens * 4.345)}]' usage.json

3. Compute the cloud bill at three discount tiers. Headline (no discount), cached (system prompt cache hit on every call), and cached plus batch (only for batchable workloads). A small projection script makes this reproducible.

npx tsx scripts/cost-projection.ts \
  --usage usage.json \
  --model sonnet-4-6 \
  --cache-hit-rate 0.85 \
  --batch-share 0.30

4. Compute the self-hosted bill. Add the fixed monthly costs (VPS, monitoring, backup, off-site replication), then divide by the number of clients to get per-client cost. Compare to the cloud bill from step 3.

5. Decide per workload, not per agency. The output of this exercise is rarely "go all-in on cloud" or "go all-in on self-hosted." It is "route interactive Sonnet calls to BYOK with caching, batch the nightly enrichment to a discount provider, and keep the gateway and the workspace self-hosted." That mixed posture saves the most money in 2026 with the least operational risk.

Three-tier cost comparison: light, medium, heavy

To make the trade-offs concrete, here is a side-by-side at three realistic agency scales. Numbers assume Sonnet 4.6 with an 85% prompt cache hit rate, 30% of workload batchable, plus standard infrastructure.

TierVolumeCloud (BYOK + caching)Self-hosted gateway, BYOK inferenceSelf-hosted gateway + open-weight GPU

Light (5 clients)~3M tokens/day~$280/mo inference + $0 infra~$280/mo inference + $60/mo VPS~$60/mo VPS + $144/mo GPU = $204/mo
Medium (25 clients)~15M tokens/day~$1,400/mo inference + $0 infra~$1,400/mo + $100/mo VPS~$100/mo VPS + $212/mo GPU = $312/mo
Heavy (100 clients)~80M tokens/day~$7,500/mo inference + $0 infra~$7,500/mo + $200/mo VPS~$200/mo VPS + $1,820/mo H100 = $2,020/mo

Three observations from the table. First, the gateway VPS is rounding error at every tier; the question is never "can I afford the gateway." Second, BYOK to a managed endpoint beats a self-hosted GPU at every realistic agency volume up to roughly 80 million tokens per day, where the open-weight GPU finally undercuts the BYOK bill. Third, the published independent 36-month TCO study that put heavy-tier self-hosting at $391,707 against $540,000 for Anthropic is consistent with the table once you scale the heavy tier up another 4 to 5 times to enterprise volume.

Hidden costs that flip the math

A clean TCO model still misses three categories of cost that matter at the agency scale.

Engineering hours for setup. A clean OpenClaw deployment with monitoring, backups, and a per-client provisioner takes roughly 16 to 24 hours of senior engineer time the first time, dropping to 1 to 2 hours per new client thereafter. At a $150 per hour blended rate the first deployment is a $2,400 to $3,600 one-time cost. That number disappears at 100 clients but dominates at 5.

Ongoing maintenance. Patching the operating system, rotating tokens, refreshing model pricing tables, dealing with provider deprecations. Industry estimates put this at 1 to 6 hours per month per agency depending on tier. Often forgotten in cloud comparisons because the cloud provider absorbs it silently inside the per-token price.

Outage risk. A self-hosted gateway with one VPS and no failover hits roughly 99.5% practical uptime, which is 3.6 hours of downtime per month. For a chat workload that is invisible. For a missed-call follow-up workload at a dental practice it is a real revenue hit. Multi-region failover, even a warm standby, can double the infrastructure line. Most agencies accept the single-region risk and document it; some do not have that luxury.

Add these three to the model and the break-even point shifts. Cloud is the right answer below about $300 per month in inference. Self-hosted is the right answer above about $2,500 per month. Between those two figures it is a judgment call about engineering capacity, operational appetite, and risk tolerance.

The hybrid pattern most operators land on in 2026

The cleanest 2026 architecture has four parts, and it shows up in roughly the same shape across most agencies that have done the math:

  Gateway, memory, and audit self-hosted on a single CPU VPS. OpenClaw, per-client containers, append-only audit log, off-site backup nightly.
  BYOK to a managed model endpoint for interactive workloads. Region-pinned, prompt caching always on, sticky session routing to maximize cache hits.
  Batch routing for non-interactive workloads through Anthropic's Message Batches API or an equivalent. Halves the bill on every job that can wait 24 hours.
  Optional open-weight GPU for high-volume embedding generation, classification, or sensitive workloads that cannot leave the perimeter. Skipped at light and medium tiers.

This pattern keeps inference cheap and predictable, keeps prompts and customer data inside the agency's perimeter, and keeps the engineering surface small. It is also what the OpenClaw daemon was designed to support: a single gateway that fronts every channel and every client regardless of whether the model behind it is hosted by Anthropic, OpenAI, OpenRouter, or your own GPU.

When self-hosted isn't for you

Three honest situations where self-hosted is the wrong answer in 2026. If any of these apply, stay on cloud-first inference until they don't.

You have fewer than 5 paying AI clients. The fixed cost of running and maintaining infrastructure isn't worth recovering across a small base. Spend the engineering time on selling, not on a gateway you barely use.

You have no Linux operator on the team. Self-hosted means somebody is on the hook when a kernel update breaks the network bridge at 2am. If that role is unfilled, a managed deployment partner or a pure-cloud architecture is honestly safer.

Your token volume is bursty and unpredictable. A self-hosted GPU sitting idle 22 hours a day is the most expensive way to serve traffic. Cloud auto-scales for free; the GPU does not. Workloads with sharp peaks and long valleys are the textbook case for staying on managed inference.

Frequently asked questions

Is self-hosted AI always cheaper than cloud?

No. At light token volume cloud is decisively cheaper because the fixed cost of a VPS, monitoring, and engineering hours has nowhere to amortize. The crossover for an agency stack typically sits between 5 and 25 paying clients, depending on per-client token usage and how much engineering time the operator already has available. Above that point self-hosted gateways with BYOK inference run roughly 25 to 40 percent cheaper than pure cloud at the same workload.

How much does an OpenClaw gateway cost to run per month?

For most agencies, between $60 and $200 per month in raw infrastructure. A single Hetzner CCX33 (about EUR 52 per month) handles up to about 20 clients comfortably; a CCX43 (around EUR 100 per month) covers 40 to 50. Add about $20 per month for backup storage and another $10 to $30 for monitoring (Better Stack, Grafana Cloud free tier, or Uptime Kuma self-hosted). That figure does not include inference, which lands separately on the BYOK model bill.

Does prompt caching really cut my Anthropic bill by 90%?

It cuts cached input by 90%. The realized saving on the total bill depends on cache hit rate and the input/output ratio of the workload. A typical agency chat workload with a long system prompt and a fixed skill loadout sees a 60 to 75 percent reduction in the total monthly bill once caching is enabled and sticky session routing keeps the same provider endpoint serving the same conversation. Pair caching with batch processing on non-interactive jobs and the saving climbs further.

What happens to my cost if I move from Sonnet 4.6 to Opus 4.7?

The rate card jumps from $3/$15 to $5/$25 per million tokens. That looks like a 67% increase. The real increase is larger because Opus 4.7 ships with a new tokenizer that can produce up to 35% more tokens for the same input. Test on a representative workload before switching the whole fleet. Many agency workloads do not need Opus, and Sonnet 4.6 with caching is the value sweet spot in May 2026.

Can I run an AI worker on a CPU-only VPS?

For the gateway, the workspace, and the audit log, yes. For inference, only if you are doing classification or short generation on a small open-weight model and can tolerate single-digit tokens per second. Almost every interactive agent in 2026 routes inference to a GPU somewhere, either yours or the model provider's. The win in self-hosting the gateway is operational, not inference-cost.

How long until self-hosted pays for itself?

For a medium-tier agency (25 active clients) the typical payback period is 3 to 6 months from the initial setup investment. Light agencies often never break even on a self-hosted GPU but do break even on a self-hosted gateway with BYOK inference, usually within a year. Heavy agencies (100+ clients) typically pay back the entire setup in under a quarter. Track inference spend monthly during the trial period and revisit the model after each new client onboarding.

The honest bottom line on AI cost in 2026

Cloud LLMs in 2026 are cheaper per token than they have ever been, and prompt caching plus batch processing has compressed the gap further still. For most agencies starting out, the right first move is not a self-hosted GPU but a self-hosted gateway with BYOK inference: a single CPU VPS, an OpenClaw daemon, region-pinned model endpoints, and the discipline to enable caching on every prompt. That setup keeps customer data inside your perimeter, keeps the inference bill tied directly to revenue, and stays an order of magnitude cheaper than enterprise SaaS workforce platforms billed per seat.

If you want that hybrid stack standing up without spending a fortnight on infrastructure plumbing, that is the architecture Kyra deploys for you on day one: gateway, per-client isolation, BYOK routing, and an audit trail that survives a procurement review. For deeper reading on the building blocks, the OpenClaw gateway explainer covers the daemon itself, the container isolation breakdown covers the multi-tenant security side, and the dental practice playbook shows what an end-to-end deployment looks like in a regulated industry. The two external references worth bookmarking are the Anthropic prompt caching documentation for getting the rate card down and the openclaw/openclaw GitHub repository for the gateway itself. The cost story in 2026 is no longer about choosing between cloud and self-hosted. It is about knowing which workload belongs where, and building a single stack that holds both with a straight face.

---

### Per-Client AI Container Isolation in 2026: How Agencies Run 50+ AI Workers Without Cross-Contamination

- URL: https://kyra.conversionsystem.com/blog/per-client-ai-container-isolation-2026
- Published: 2026-05-01
- Category: AI Infrastructure
- Read time: 12 min

Last updated: May 1, 2026

Per-client AI container isolation is a deployment pattern that runs each client's AI worker in a separate hardened sandbox so prompts, files, credentials, and tool calls from one client cannot reach another. For agencies operating 10, 50, or 500 AI workers from the same dashboard, it is the difference between a controllable platform and a single bad prompt that quietly leaks every tenant's data.

This guide breaks down the four isolation models that matter in 2026, the threat model each one addresses, and a step-by-step setup using OpenClaw and Docker. By the end you will know how to draw a hard boundary around every client without paying enterprise SaaS prices to do it.

Key takeaways
Per-client container isolation gives every AI worker its own filesystem, network, credentials, and process tree. A prompt injection inside client A cannot reach client B.
Four isolation models matter in 2026: shared process, Docker container, gVisor sandbox, and Firecracker microVM. Each trades isolation for startup time and overhead.
OpenClaw v2026.4.27 ships a per-session Docker sandbox runtime out of the box. Anthropic's Claude Managed Agents (launched April 8, 2026) uses gVisor at $0.08 per session-hour.
For agencies running 50+ clients, self-hosted Docker isolation costs roughly $0.003 per worker-hour in compute. Managed alternatives cost about 25x more in runtime alone.
By the end of 2026, 40% of enterprise applications will have embedded task-specific agents, up from less than 5% in early 2025. Multi-tenant isolation is now table stakes.
Isolation only holds if the boundaries hold. Default-deny network egress, scoped filesystems, and proxy-injected credentials are non-negotiable.

Why container isolation matters for multi-tenant AI agents in 2026

An AI agent is not a chatbot. It runs tool calls, writes files, hits APIs, executes shell commands. When the same process serves multiple clients, a single prompt injection in one tenant becomes a path into every other tenant on the box. The blast radius scales with your client list.

This is not hypothetical. By the end of 2026, 40% of enterprise applications will have embedded task-specific agents, up from less than 5% in early 2025. Most of those agents are deployed by agencies and platforms running multi-tenant infrastructure. If the architecture cannot draw a hard boundary at the tenant level, every new client expands the attack surface of every existing client.

The 2026 isolation question is therefore not "should we isolate?" but "where do we draw the boundary?" The boundary can sit at the application layer, at the container, at the kernel syscall layer, or at the virtual machine. Each option costs more and isolates more. Agencies need to know which one fits which client, and how to operate the boundary so it actually holds.

The four isolation models that matter in 2026

There are four production-grade approaches. Picking the right one depends on data sensitivity, the agent's tool surface, and how much you can pay per worker-hour.

1. Shared process, logical isolation. All clients run inside one process. Each request carries a tenant ID. The application enforces the boundary in code. Cheapest to run, weakest to compromise. Suitable only when the agent has read-only tools and no shell access.

2. Per-client Docker container. Each client gets its own container with namespaced filesystem, network, processes, and limited Linux capabilities. Standard containers share the host kernel, but with hardening (user namespace remapping, dropped capabilities, no-new-privs, resource limits) they block the most common attacks. This is the OpenClaw default for non-main sessions.

3. gVisor sandbox. gVisor sits between the container and the host kernel. It intercepts syscalls in user space and re-implements them, so a kernel exploit inside the container cannot directly hit the host. This is the model Anthropic chose for Claude Managed Agents. Slightly slower than raw Docker, dramatically harder to escape.

4. Firecracker microVM. A hardware-virtualized VM, but stripped to the bone. Boots in roughly 125 ms with under 5 MiB overhead per VM. A single host can launch up to 150 microVMs per second. This is the technology AWS uses behind Lambda. The strongest isolation; the highest infrastructure complexity.

Here is how the four compare in practice for an agency running per-client AI workers:

ModelBoundaryStartupOverheadEscape difficultyBest for

Shared processApplication code0 ms~5 MBTrivial if injection worksRead-only chat, no tools
Docker containerLinux namespaces200–800 ms~30 MBHard with hardeningMost agency clients
gVisor sandboxSyscall interception500–1500 ms~50 MBVery hardRegulated workloads
Firecracker microVMHypervisor~125 ms<5 MBHardware-gradeUntrusted code execution

Threat model: what cross-contamination actually looks like

Before locking down the architecture, name the attacks you are defending against. There are five concrete failure modes a multi-tenant AI platform must block.

Prompt injection across tenants. Client A pastes a malicious instruction. The agent calls a shared tool. The tool returns content from client B's workspace. Logical isolation alone does not stop this; the agent process must not be able to reach client B's files in the first place.

Credential leakage. Each client has API keys for GoHighLevel, Stripe, Twilio, OpenAI, and so on. If those keys live in a shared environment variable, any agent with shell access can read them. The fix is per-client credential scoping, ideally injected by a proxy outside the agent's view.

Filesystem traversal. The agent writes a file. The path is constructed from user input. Without a chrooted or namespaced filesystem, the file lands somewhere it should not. Docker namespaces close this off by default; shared-process deployments do not.

Resource exhaustion. One client triggers a fork bomb, a memory leak, or a runaway loop. Without per-tenant CPU and memory limits, every other client on the box slows down or dies. Containers fix this with cgroups; bare processes do not.

Network pivoting. An agent gets compromised, then uses outbound HTTP to exfiltrate data or scan the internal network. Default-deny egress, with an explicit allowlist per client, blocks this entire class of attack at the network layer.

Step-by-step: per-client OpenClaw container setup

OpenClaw v2026.4.27 ships a Docker sandbox runtime that runs each non-main session inside its own container. Below is the setup for an agency that wants every client on its own isolated worker.

Assume you have a VPS with Docker, 8 GB RAM, and root access. Each client container runs around 1.5 GB. A 32 GB box comfortably holds 18–20 production clients before swap pressure starts.

1. Pull the OpenClaw image and create a per-client directory.

docker pull openclaw/openclaw:v2026.4.27
mkdir -p /srv/openclaw/clients/acme-dental
cd /srv/openclaw/clients/acme-dental

2. Generate a per-client config. Each client gets its own openclaw.json, auth-profiles.json, and workspace. Never share these across tenants.

cat > openclaw.json <<'EOF'
{
  "gateway": {
    "auth": { "type": "token" },
    "trustedProxies": ["10.0.0.0/8"]
  },
  "agents": {
    "defaults": { "model": "openai/gpt-4o-mini" }
  },
  "channels": ["whatsapp", "web"]
}
EOF

3. Create a per-client Docker network. Each client gets its own bridge so containers cannot see each other's traffic.

docker network create --driver bridge acme-net

4. Launch the container with hardening flags. The flags below drop capabilities, prevent privilege escalation, set resource limits, and pin the user to a non-root UID inside the container.

docker run -d \
  --name openclaw-acme-dental \
  --memory=1536m --cpus=1.0 \
  --cap-drop=ALL \
  --security-opt=no-new-privileges \
  --read-only \
  --tmpfs /tmp:rw,noexec,nosuid \
  -v /srv/openclaw/clients/acme-dental:/workspace \
  --network=acme-net \
  -p 127.0.0.1:8001:8080 \
  openclaw/openclaw:v2026.4.27

5. Inject credentials via a proxy, not env vars. Run a small reverse proxy in front of the container that adds the client's API keys to outbound requests. The agent calls http://proxy/ghl/contacts; the proxy maps that to the real GoHighLevel API with the right token. The agent never reads the key.

6. Verify isolation. Exec into the container and try to reach a sibling. It should fail.

docker exec -it openclaw-acme-dental sh -c 'wget -T 3 http://openclaw-other-client:8080'
# expected: "could not resolve host"

Repeat steps 1–6 for every new client. Automate the loop with a thin provisioning script and you have a multi-tenant platform.

Network, filesystem, and credential boundaries

Containers do most of the work, but three boundaries deserve special attention because that is where most leaks happen in production.

Network. Default-deny is the rule. Inside each container, allow outbound traffic only to the specific domains the agent needs: the LLM API, the client's CRM, and a logging endpoint. Anthropic's Managed Agents default to deny-all egress with explicit allowlists, and there is no reason your self-hosted setup should be looser. Implement this with an egress proxy or with iptables rules in the container's network namespace.

Filesystem. Mount the container read-only and put any writable area in a tmpfs or a per-client volume. Two paths matter: a writable /workspace for the agent's state, and a read-only mount for shared skills and templates. This is the same split Anthropic uses (/workspace writable, /source read-only). It works because it is simple to reason about and easy to audit.

Credentials. Never bake API keys into the container image. Never inject them as environment variables either, because any process inside the container can read them. The right pattern is a credential proxy: the agent makes a request to a local sidecar, the sidecar injects the right key for the right tenant, and the response flows back. The agent sees URLs and bodies, never tokens.

The real cost of running 50+ isolated AI workers

Isolation has a price. The question is whether you pay it in compute or in dollars.

A self-hosted Docker setup on a $40-per-month VPS with 32 GB RAM holds around 18 clients comfortably. That works out to roughly $0.003 per client-hour for the runtime. Add the LLM tokens and the all-in cost per client per hour stays under $0.05 for most agency workloads.

Anthropic's Claude Managed Agents, launched April 8, 2026, charges $0.08 per session-hour for runtime alone, on top of standard token rates. A worker running 24/7 costs around $58 per month in runtime before tokens. For 50 clients running continuously, that is roughly $2,900 per month just for the gVisor sandboxes. The same workload on three self-hosted VPSes costs around $120 in compute.

The trade is not money for nothing. Managed Agents gives you gVisor isolation, automatic patching, and a credential proxy out of the box. Self-hosting gives you control, BYO keys, and predictable cost. Most agencies running 10+ clients land on self-hosted Docker with optional gVisor for the regulated tenants.

Setup50 clients runtime / monthIsolation strengthOperational burden

Shared process on 1 VPS~$40Logical onlyLow (one process)
Self-hosted Docker (3 VPSes)~$120ContainerMedium
Self-hosted gVisor~$200SyscallHigh
Claude Managed Agents~$2,900gVisorNone (you pay for it)

Operational practices that keep isolation real

Containers do not isolate themselves. The boundary holds only if the operator keeps it tight. Five practices separate platforms that survive their first audit from those that do not.

Patch on a schedule. Run docker pull against the OpenClaw image weekly. The April 27, 2026 release alone shipped a long tail of reliability and security fixes. Old images accumulate CVEs.

Restart cadence. Containers should be cattle, not pets. Restart each client's container nightly so a leaked credential or compromised process has a finite life. Use a cron entry, a hook, or a systemd timer.

Per-client logs. Pipe container stdout to a per-tenant log stream. Never log into a shared file. If client A reads a log line that came from client B, you have re-introduced cross-contamination at the observability layer.

Capability audits. Every quarter, list which Linux capabilities each container holds. Drop the ones that are unused. Most AI workers do not need NET_ADMIN, SYS_PTRACE, or DAC_OVERRIDE. Confirm by inspecting the running container's effective capabilities with capsh --print.

Allowlist drift. Each client's egress allowlist grows over time as the agent picks up new tools. Review the list quarterly. Anything not actively used gets removed. The fewer destinations the agent can reach, the smaller the exfiltration surface.

When per-client isolation is not for you

Container isolation is not always the right call. Three situations make it overkill.

You only have one client. A single tenant on a single box does not need cross-contamination defenses, because there is nothing to contaminate. Run the agent in a normal process, harden the host, and save the operational overhead.

The agent has no tools. A pure chat assistant that only calls an LLM and returns text has almost no attack surface. The worst a prompt injection can do is generate weird output. Containerization here is theater.

You can pay for managed. If you have ten clients and $300 of margin per client, paying Anthropic $58 per worker-month for Claude Managed Agents is fine. You trade money for the headache of running infrastructure. For agencies still building margin, that math flips fast.

Everywhere else, per-client isolation is the default. The cost of getting it wrong is one breach away from being existential.

Frequently asked questions

Does Docker container isolation actually stop prompt injection attacks?
It does not stop the injection itself. The malicious prompt still runs. What it stops is the blast radius. A compromised agent inside a hardened container cannot read another tenant's files, cannot call another tenant's APIs, and cannot exfiltrate to arbitrary domains. The injection becomes a contained incident instead of a platform-wide breach.

How is gVisor different from a standard Docker container?
A standard Docker container shares the host's Linux kernel. A kernel exploit inside the container can hit the host directly. gVisor sits between the two, intercepts syscalls in user space, and re-implements the dangerous ones. The agent never talks to the real kernel, so a kernel CVE in the container does not become a host compromise. The trade is a small performance hit and slightly slower startup, which is why Anthropic chose this model for Claude Managed Agents.

Why not just use a virtual machine per client?
You can. Firecracker microVMs are designed for exactly this and boot in around 125 ms with under 5 MiB of overhead per VM. The reason most agencies do not is operational complexity: networking, image management, and orchestration are all heavier than Docker. Use microVMs when you are running untrusted code, hosting code interpreters, or operating in a regulated industry where a hypervisor boundary is the audit requirement.

How many isolated AI workers can one VPS run?
Roughly RAM divided by 1.5 GB, minus 2 GB for the host. A 32 GB VPS holds 18–20 OpenClaw workers comfortably. CPU is rarely the bottleneck since most agent time is spent waiting on LLM responses. If your workers are heavy on local tool calls or RAG over local files, drop the density to 12–15 per box.

Can I mix isolation levels for different clients?
Yes, and this is what most mature platforms do. Standard SMB clients run on hardened Docker. Healthcare and finance tenants run on gVisor or Firecracker. Trial users run on shared-process with read-only tools. The orchestrator picks the runtime per client based on a sensitivity tier in the database. The key is that the tier is set at provisioning time and never silently downgraded.

What happens if I just run everything on Anthropic's Claude Managed Agents?
You get production-grade gVisor isolation without operating any infrastructure. You also pay $0.08 per session-hour plus token costs, with batch discounts disabled. For an agency with a handful of clients and high margin, this is a reasonable trade. For an agency running 50+ clients on tight margins, the math does not work and you will end up self-hosting the workers and using Managed Agents only for the regulated tenants.

Closing thought

Per-client AI container isolation is the architectural decision that separates a platform from a script. Get it right and you can scale to hundreds of clients on commodity hardware while sleeping at night. Get it wrong and your first prompt injection becomes a disclosure email to every tenant on the box. The technology to do this well, at agency prices, is sitting in the OpenClaw repository today. The hard part is not the runtime. The hard part is the operational discipline to keep the boundaries tight as the platform grows.

If you would rather skip the Docker yak-shave and ship hardened per-client AI workers in a few minutes, Kyra deploys this exact architecture for agencies, with per-client isolation, BYO keys, and a credential proxy already wired in. We use the same OpenClaw 2026.4.27 runtime described above and follow the same isolation rules. Pair it with our AI data sovereignty guide and the OpenClaw gateway explainer if you want the full architectural picture. For canonical sources on isolation primitives, see the OpenClaw repository on GitHub, the OpenClaw Docker documentation, and Anthropic's secure agent deployment guide.

---

### AI Data Sovereignty in 2026: Why Self-Hosted Is Winning Regulated Industries

- URL: https://kyra.conversionsystem.com/blog/ai-data-sovereignty-self-hosted-2026
- Published: 2026-04-27
- Category: AI Infrastructure
- Read time: 12 min

Last updated: April 27, 2026

AI data sovereignty is the legal and technical guarantee that the prompts, completions, embeddings, and logs produced by an AI workload stay under the jurisdiction and control of the organization that owns the data. It is the difference between an AI feature that runs inside your security perimeter and one that quietly ships every customer message to a server in another country. With the EU AI Act reaching its main application date on August 2, 2026, penalties of up to 7% of global annual turnover for prohibited practices, and 95% of senior executives now describing sovereign AI as a mission-critical priority, the question every regulated business is being forced to answer is the same one: where, exactly, does your AI data live, and who can be compelled to hand it over?

  Key takeaways
  
    Data sovereignty in 2026 is a technical sovereignty question, not just a residency one. Where the bytes sit matters less than who controls the stack.
    The EU AI Act becomes broadly applicable on August 2, 2026, with non-compliance penalties of up to 35 million EUR or 7% of global turnover.
    The US CLOUD Act lets US authorities compel American providers to hand over data even when the servers are in Frankfurt or Sydney. EU residency does not fix this on its own.
    Anthropic earned SOC 2 Type II and HIPAA certification in March 2026, but only specific products are covered by a BAA. Default Claude routes still touch US infrastructure.
    Self-hosted AI gateways like OpenClaw 2026.4.24 give you the audit trail, key custody, and tenant isolation regulators ask about. The trade-off is roughly 40% more engineering effort than a managed service.
    The right answer for most regulated buyers in 2026 is hybrid: self-host the gateway and the memory, bring your own keys to a region-pinned model endpoint.
  

What AI data sovereignty actually means in 2026

Five years ago, "data sovereignty" mostly meant data residency: tick a box, pick a region, store the database in Frankfurt instead of Virginia. That definition has aged badly. The 2026 conversation is about technical sovereignty, which the EU's own guidance now defines as the verifiable ability to control where data is processed, who can access it, and which legal regime applies when a regulator or a foreign court comes knocking.

For an AI workload that distinction is sharp. A customer-support agent built on a US-headquartered cloud provider can absolutely be configured to store its vector database in Frankfurt. The provider's parent company is still subject to the US CLOUD Act, which means a US warrant can compel disclosure of data held abroad. Residency without sovereignty is a paper guarantee. The auditor will notice. The Data Protection Authority will notice. Increasingly, the customer in the procurement call will notice too.

Sovereignty has three practical layers in 2026. The data layer covers prompts, completions, embeddings, and logs. The control layer covers the keys, the model endpoints, and the orchestration runtime. The legal layer covers which jurisdiction binds the company holding any of those things. A workload is sovereign when all three layers stay inside a single legal perimeter you actually control.

Why the rules tightened in 2026

The pressure on AI deployments is coming from four directions at once.

The first is the EU AI Act. The regulation entered into force on August 1, 2024, and the bulk of its obligations become applicable on August 2, 2026. Some high-risk system deadlines have shifted to late 2027 and 2028 under the proposed Digital Omnibus reforms, but the core data-governance and transparency rules land this summer. High-risk AI systems must produce risk assessments, activity logs, and human-oversight records on demand. Penalties scale with company size: up to 35 million EUR or 7% of worldwide annual turnover for prohibited practices, up to 15 million EUR or 3% for other infringements, and up to 7.5 million EUR or 1% for supplying misleading information. Those numbers exceed GDPR's caps.

The second is GDPR plus its sector-specific siblings. DORA in financial services, NIS2 in critical infrastructure, and the proposed European Health Data Space all add data-flow obligations that an opaque cloud AI pipeline struggles to satisfy. Auditors now ask which model processed which prompt, in which region, under which contract.

The third is the US CLOUD Act. Passed in 2018, it remains the cleanest example of why residency alone fails. The Act lets US authorities compel any US-headquartered provider to disclose data it controls, regardless of where the servers physically sit. For European buyers, this is the recurring objection in every AI procurement cycle. A growing share of EU regulators now treat any non-EU-controlled processor as a residual risk that must be documented even when a Standard Contractual Clause is in place.

The fourth is sector regulation in the US itself. HIPAA for healthcare, GLBA for financial services, FedRAMP for federal workloads, and CJIS for law enforcement all assume the operator can produce a clean chain of custody for the data the AI sees. A vendor whose Business Associate Agreement covers only a subset of its products, which is the situation for most major AI vendors today, leaves the buyer to fill the gap.

Where cloud AI quietly breaks for regulated workloads

Most AI features ship today on a default that looks like this: the application calls a managed model endpoint, the prompt and completion are logged for abuse monitoring, retention is governed by the provider's standard policy, and the data may be routed across regions based on capacity. For a marketing chatbot that is fine. For a hospital intake assistant or a financial advice agent it is the start of a compliance problem.

The breakage points are predictable. Prompt content often contains regulated data the engineer did not realize was regulated, such as a customer reference number that maps to a patient identifier upstream. Provider-side logging means a copy of that prompt now exists in a system the customer has no read access to. Cross-region routing during peak load means the same prompt may be processed by a model instance outside the contracted region for a few hours. Sub-processor chains add second and third parties the customer never directly reviewed.

None of these are bugs. They are the design of a managed service optimized for uptime and cost. They become a problem only when a regulator or a customer asks for an exact accounting of where a single message went. Self-hosted infrastructure removes most of those questions because the answer is "it stayed inside our virtual network and we have the logs to prove it." That is what regulators mean when they say sovereignty.

What self-hosted AI gives you (and what it costs)

Running the gateway, the orchestration runtime, and the memory store on infrastructure you control changes the audit story in a specific way. Every prompt has a single, observable processing path. Every key, including the model API key and the database credentials, sits in a vault you own. Every log line is generated by a process you operate, on a host you patch, in a region you chose. When the auditor asks for the chain of custody for a particular conversation, you can produce it from one log file.

A self-hosted gateway also unlocks a few capabilities that managed AI platforms still struggle with. Per-tenant key isolation, which lets each client of an agency run on a different model API key, becomes a first-class feature instead of a workaround. Pluggable model endpoints let you point the same gateway at Anthropic's API today, a Vertex AI endpoint in Frankfurt tomorrow, and a fully on-prem GPU cluster the day after that, without touching application code. Custom retention policies can be enforced by the gateway itself rather than negotiated with a vendor.

The honest cost is engineering time. A 2026 benchmark widely cited in regulated-industry write-ups put the engineering effort for self-hosted LLM stacks at roughly 40% above the equivalent managed setup. Patching, monitoring, certificate rotation, and the specific work of building a defensible audit pipeline do not happen for free. The right comparison is not self-hosted versus cloud in the abstract, it is self-hosted versus the cost of explaining to a regulator why a sub-processor in another jurisdiction touched a customer record.

Step-by-step: deploy a sovereign AI stack with OpenClaw

This walkthrough takes a fresh Linux VPS in your chosen region from zero to a sovereign AI gateway you can put in front of WhatsApp, Slack, web chat, or any of the other channels OpenClaw supports. It targets OpenClaw 2026.4.24 or later, which is the release that introduced the localModelLean profile, the Model Auth status card, and cloud-backed LanceDB for memory indexes.

1. Provision the host inside your legal perimeter. Pick a VPS or bare-metal host whose provider sits in your target jurisdiction. For an EU workload this typically means Hetzner, OVH, or Scaleway in an EU region. Patch the OS, set up a non-root user, and put the host behind your existing firewall.

ssh sovereign-host
sudo adduser openclaw
sudo ufw allow 22/tcp
sudo ufw allow 18789/tcp
sudo ufw enable

2. Install the OpenClaw daemon. The MIT-licensed daemon ships as a single binary plus a config directory. The install script binds the gateway to its default port 18789 and creates a systemd unit.

curl -sSf https://install.openclaw.ai | sh
openclaw init --profile sovereign
sudo systemctl enable --now openclaw

3. Pin the model endpoint to a region you control. Edit ~/.openclaw/config.yml so the gateway calls a region-pinned endpoint instead of the default. For Anthropic via Vertex AI in Frankfurt the relevant block looks like this.

model:
  provider: vertex
  region: europe-west3
  endpoint: https://europe-west3-aiplatform.googleapis.com
  apiKeyEnv: GOOGLE_APPLICATION_CREDENTIALS
session:
  dmScope: per-channel-peer
  keyFormat: "tenant-${tenant_id}-${channel}:${peer}"
logging:
  retention: 30d
  destination: /var/log/openclaw/audit.log

4. Bring your own keys and store them in a vault. Never commit a model key to disk. Mount a HashiCorp Vault, AWS Secrets Manager, or a plain Linux keyring into the systemd unit and reference the secret by env var. The gateway reads it at start time and never writes it back.

5. Turn on per-tenant isolation. If you operate for multiple clients, prefix every session key with a tenant identifier. The OpenClaw session key system already supports tenant-prefixed keys natively, which means one daemon can serve fifteen clients with provably separate context stores.

6. Verify the audit trail. Send a test message through one channel, then run the gateway's audit command and confirm the message appears with the expected tenant, channel, peer, region, and model in a single line. This is the artifact you will hand an auditor.

openclaw audit --since 1h --format json | jq '.[0]'

The whole sequence is roughly an afternoon of work for an engineer who has done it once. The full reference is in the official OpenClaw gateway security documentation, and the daemon source lives at the openclaw/openclaw repository on GitHub.

Self-hosted vs cloud AI: a 2026 head-to-head

The trade-offs are easier to see when laid out side by side. The table below summarizes the differences that show up most often in regulated-industry procurement reviews.

  
    
      Dimension
      Cloud AI (default managed setup)
      Self-hosted AI (OpenClaw or equivalent)
    
  
  
    
      Data residency control
      Region selection, but provider may route across regions during peak load
      Bytes never leave the host you operate unless you explicitly send them
    
    
      Subject to US CLOUD Act
      Yes, if the provider is US-headquartered, regardless of server location
      No, when the host operator and the model endpoint are both outside US jurisdiction
    
    
      Prompt and completion logging
      Provider-side logging on by default for abuse monitoring
      You decide what is logged, where, and for how long
    
    
      Audit trail
      Limited to what the provider's console exposes
      Full, line-by-line, in your own log infrastructure
    
    
      Per-tenant key isolation
      Usually one provider key per organization, tenants share the same key
      One key per tenant is a first-class config option
    
    
      Engineering effort
      Baseline
      Roughly 40% above the managed equivalent in 2026 benchmarks
    
    
      Time to first deploy
      Hours, sometimes minutes
      An afternoon for the gateway, longer for the audit pipeline
    
    
      BAA / DPA scope
      Often covers only a subset of the provider's product line
      One agreement, one operator, one scope: yourself
    
    
      Best fit
      Marketing, internal productivity, low-sensitivity public chat
      Healthcare, financial services, government, multi-client agencies
    
  

Reading across the rows, the pattern is consistent. Cloud AI optimizes for time-to-first-deploy and operational simplicity. Self-hosted AI optimizes for control and the artifacts a regulator wants to see. Neither is universally correct.

When self-hosted AI is not for you

The honest answer is that most companies do not need full sovereignty for most workloads. A small marketing agency running a chat widget on a brochure site can use a managed AI endpoint, accept the standard data processing addendum, and ship in a day. The cost of running a sovereign stack for a workload that processes no regulated data is real, and the audit benefit is hypothetical.

Self-hosted AI is the wrong choice when the team has no platform engineer, when the workload processes only public or pseudonymous data, when the volume is so low that the fixed cost of a VPS exceeds a year of metered API usage, or when speed-to-market dominates every other consideration. It is also the wrong choice when the team would self-host poorly: a misconfigured self-hosted gateway with a public log directory is worse than a properly configured managed service.

The right framing is workload-by-workload. Sovereign infrastructure for the regulated workload, managed AI for the marketing site, a single gateway in front of both so the operational story stays manageable. That hybrid pattern is what most of the regulated buyers we talk to are converging on in 2026.

Frequently asked questions

Is data residency the same as data sovereignty?

No. Residency is about where the bytes sit. Sovereignty is about who controls the stack and which legal regime can compel disclosure. Data stored in Frankfurt by a US-headquartered provider is EU-resident but not EU-sovereign, because the US CLOUD Act still applies to the parent company. Real sovereignty needs the operator, the keys, and the legal entity all inside one jurisdiction.

Does the EU AI Act ban cloud AI?

No. The Act does not require self-hosting. It requires that high-risk AI systems produce risk assessments, activity logs, and human-oversight records, and that providers and deployers can answer specific questions about training data, processing, and decisions. Cloud AI can satisfy those obligations when the contract and the audit trail are strong enough. Self-hosted AI usually satisfies them more cheaply because the operator already has the logs.

Can I run a HIPAA-compliant AI workload on Anthropic's Claude?

Sometimes. Anthropic earned SOC 2 Type II and HIPAA certification in March 2026, and offers a Business Associate Agreement that covers specific products including the first-party API and a HIPAA-ready Enterprise plan. The BAA scope is product-specific, so you need to confirm that the exact endpoint your workload calls is covered. For workloads where the BAA does not extend, the standard pattern is a self-hosted gateway in front of a region-pinned model endpoint, with a custom audit pipeline.

Does self-hosted mean running the model itself on my own GPUs?

Not necessarily. The most common 2026 architecture is a self-hosted gateway and memory store with a managed model endpoint pinned to a region inside your jurisdiction. You get sovereignty for prompts, completions, embeddings, and logs without the capex of a GPU cluster. Running open-weight models on your own hardware is a further step, useful when even the inference call cannot leave your perimeter, but it is not the default.

How long does sovereign AI migration usually take?

Industry analyst writeups in 2026 estimate three to four years for a full sovereign migration of a regulated enterprise's AI workloads. That figure is dominated by organizational work, not technology: workload classification, contract renegotiation, vendor swaps, and operator training. The first sovereign workload, by contrast, can be live in a few weeks. Most teams ship the highest-risk workload first and migrate the rest on a rolling basis.

What is the smallest credible sovereign AI stack?

One Linux host, the OpenClaw daemon, a region-pinned model endpoint, a vault for the model key, and an append-only audit log shipped to your existing SIEM. That is enough to satisfy most regulated-industry initial reviews. Everything else, including multi-region failover, dedicated GPUs, and bring-your-own-cloud deployments, is an extension of the same pattern.

Sovereignty is an architecture decision, not a checkbox

The instinct in 2025 was to treat data sovereignty as a procurement field. Pick the region, sign the addendum, move on. The 2026 reality is that sovereignty is the architecture: which runtime, which keys, which jurisdiction, which audit pipeline. The companies that get it right are the ones that decide early which workloads cannot leave their perimeter and design the stack around that decision instead of bolting it on later. The ones that get it wrong are the ones that discover, during an audit, that "EU-resident" was not the same as "EU-controlled."

If you want a sovereign OpenClaw gateway running on your own infrastructure, with per-tenant isolation, a region-pinned model endpoint, and an audit trail wired into your existing logging without weeks of platform work, that is what Kyra sets up for you. For the broader picture of how a single gateway holds dozens of channels and clients together, the OpenClaw architecture explainer covers the building blocks, and there are industry-specific starting points for dental practices and other regulated workloads. The two external references worth keeping bookmarked are the Anthropic developer documentation for the model side and the openclaw/openclaw GitHub repository for the gateway side. Sovereignty is harder than ticking a region selector, but it is also a long way from impossible. Pick the workload, pick the perimeter, and build outward from there.

---

### WhatsApp AI Agent with OpenClaw: The 2026 Agency Setup Guide After Meta's Chatbot Crackdown

- URL: https://kyra.conversionsystem.com/blog/whatsapp-ai-agent-openclaw-setup-2026
- Published: 2026-04-26
- Category: AI Infrastructure
- Read time: 12 min

Last updated: April 26, 2026

A WhatsApp AI agent built on OpenClaw is a self-hosted assistant that links to a WhatsApp number through the multi-device protocol and replies to inbound messages with a Claude-powered, tool-using AI worker. The setup uses the same QR code WhatsApp Web uses, so the agent acts as a linked companion device on the user's phone rather than a Meta Cloud API number. That distinction got loud in January 2026, when Meta banned open-ended AI chatbots from the official WhatsApp Business Cloud API and forced the entire ecosystem to rethink which path to take. OpenClaw's WhatsApp channel is the most popular workaround in the agency world right now, and the 2026.4.22 release made it considerably more useful for multi-tenant deployments.

  Key takeaways
  
    OpenClaw connects to WhatsApp through Baileys, the open-source implementation of the WhatsApp Web multi-device protocol. No Meta Cloud API account is needed for inbound conversational AI.
    Meta banned mainstream AI chatbots from the Cloud API on January 15, 2026. Multi-device companion devices are the practical path for 1:1 reply-driven agents.
    OpenClaw 2026.4.22 added a replyToMode option, per-group and per-direct system prompts, and a fix for duplicate messages on reconnect.
    A working setup is one channel block in config.yml, one QR scan, and a persistent volume mount for the Baileys session keys.
    Cloud API still wins for high-volume template broadcast (over 50K/day). Multi-device wins for conversational, reply-first AI workers.
    The per-channel-peer dmScope default keeps every WhatsApp peer in their own isolated context, which is what regulated industries actually need.
  

What an OpenClaw WhatsApp AI agent actually does

The job is narrow and it is concrete. A user sends a WhatsApp message to a phone number. The OpenClaw gateway, running on a VPS or a home server, receives that message through the Baileys WebSocket connection. The gateway resolves a session key, loads the right context for that conversation, hands the message to a Claude or Grok or local model, and writes the agent's reply back to WhatsApp. The user sees a normal chat thread, with read receipts and typing indicators that look like every other WhatsApp conversation.

The interesting part is what happens between the message arriving and the reply going out. The agent has access to the full OpenClaw tool surface: it can read a calendar, query a CRM, run a Stripe lookup, hit any MCP connector the gateway has wired up, and call into per-client skills. A dental front desk agent answers a "do you take Delta Dental" question by checking a Supabase row. A real estate agent answers "what's the price on 412 Maple" by hitting a custom MLS skill. The reply that comes back to WhatsApp is grounded in real data, not hallucinated.

This is the difference between a chatbot and a worker. The chatbot reads a message and produces a string. The worker reads a message, looks something up, takes an action, and then produces a string. WhatsApp is just the channel. The intelligence lives in the gateway.

Why 2026 changed the WhatsApp AI landscape

Three shifts hit the WhatsApp AI world inside a single quarter, and they pushed serious agencies toward the multi-device path.

The first was the AI chatbot ban. On January 15, 2026, Meta updated its WhatsApp Business Cloud API policy to require that automated bots have "clear, predictable results associated with business messaging." Open-ended AI chat is out. Support flows, booking flows, and order flows are in. If an agency built its product on top of the official Cloud API and routed every inbound message to Claude or GPT for a freeform reply, that product became a policy violation overnight.

The second was the pricing change. Meta moved to a per-template-message model on April 1, 2026, scrapping the old "free first 1,000 conversations per month" tier and pricing utility, authentication, and marketing templates separately by destination country. For an agency running an AI worker that holds dozens of multi-turn conversations per client per day, the per-message economics quickly stop working.

The third was the rollout of WhatsApp usernames, scheduled to begin in test countries in June 2026, with a new business-scoped user identifier (BSUID) replacing phone numbers in webhooks. That is a longer-arc change that complicates how Cloud API integrations resolve identity, while multi-device companion devices keep working unchanged because they ride on the existing WhatsApp Web protocol.

Add it up and the shape of the problem is clear. If your agency is running 1:1 conversational AI on WhatsApp in 2026, the multi-device protocol is the path with the fewest landmines. OpenClaw's WhatsApp channel was already built that way, which is why it became the default option for the kind of work agencies and GHL resellers were trying to do.

Cloud API vs multi-device: which path is yours

The honest answer is that they are different products with different jobs. Cloud API is a transactional broadcast pipe optimized for templates: shipping notifications, OTP codes, appointment reminders sent at scale. Multi-device is a conversational pipe optimized for replies inside an existing thread, the way a human would respond from their phone.

If your use case is "I need to send 250,000 marketing templates a month," Cloud API is correct, and you should accept the policy and pricing constraints. If your use case is "I need to be the AI front desk for fifteen dental practices, each with a phone that already has WhatsApp," multi-device is correct, and OpenClaw is the cleanest way to wire it up.

Most Kyra-style agencies live entirely in the second world, which is why this guide focuses there. For the broader picture of how the same gateway handles other channels, the session keys deep dive walks through how Slack, Discord, and Telegram fit alongside WhatsApp in the same install.

Step-by-step: connect OpenClaw to WhatsApp in fifteen minutes

The walkthrough below assumes you already have OpenClaw 2026.4.22 or later installed and the gateway listening on its default port. If you do not, the "What is OpenClaw" overview and the official WhatsApp channel docs cover the install side.

1. Confirm the gateway is running and reachable.

openclaw gateway status
# expected: listening on :18789, 0 active sessions

2. Create the persistent volume for Baileys session keys. This is the most common reason new installs lose the QR scan and ask you to scan again. Baileys writes the multi-device handshake material into a folder, and that folder must survive container restarts.

mkdir -p ~/.openclaw/whatsapp/auth
chmod 700 ~/.openclaw/whatsapp/auth

3. Add the WhatsApp channel block to your gateway config.

$EDITOR ~/.openclaw/config.yml

channels:
  - type: whatsapp
    sessionPath: ~/.openclaw/whatsapp/auth
    replyToMode: smart
    dmScope: per-channel-peer
    systemPrompt: |
      You are the front desk for Acme Dental.
      Answer scheduling questions and route insurance
      questions to a human if Delta Dental is mentioned.
    groups:
      enabled: true
      systemPromptOverrides:
        - groupId: "120363045xxx@g.us"
          prompt: |
            You are the staff coordinator for Acme Dental.
            Reply only when explicitly @mentioned.

4. Reload the gateway.

openclaw gateway reload

5. Trigger the QR scan and link the device.

openclaw whatsapp link --print-qr
# scan the QR with: WhatsApp > Settings > Linked Devices > Link a device

The QR code appears in the terminal as ASCII art. On the phone, open WhatsApp, navigate to Settings, then Linked Devices, then "Link a device," and point the camera at the terminal. The link completes in a few seconds and the gateway prints a "session bound" line.

6. Send a test message from a second WhatsApp account. Open WhatsApp on a different phone, send "hello" to the linked number, and watch the gateway log.

openclaw logs --channel whatsapp --tail
# expected: session resolved wa:+15551234567
# expected: agent reply dispatched in 1.4s

7. Verify isolation with a second peer. Send a different message from a third phone number. The two conversations must produce two different session keys (one per peer) and the agent must not leak context between them.

openclaw sessions list --channel whatsapp
# expected: two rows, one per peer phone number

That is the full setup. Seven commands, one config block, one QR scan. Every additional WhatsApp number is one more channel block with a different sessionPath, which is how an agency runs fifteen client phones on a single gateway.

What is new in OpenClaw 2026.4.22 for WhatsApp

The April 22 release was a small one in line count and a large one in practical impact. Three changes are worth knowing.

The replyToMode option controls whether the agent quotes the original message when it replies. Three values: off never quotes, always always quotes, and smart quotes only in groups or when the reply is to a message that arrived more than 60 seconds earlier. smart is the right default for almost every deployment because it keeps DMs feeling like a normal one-on-one chat while still pinning replies in noisy group threads.

The per-group and per-direct system prompts let one WhatsApp connection serve multiple conversational personas. The agent can be a clinical front desk in the patient DM thread and a concise staff coordinator in the internal staff group, with two completely different system prompts loaded based on the session key. Before this release, you needed two separate WhatsApp connections to do that.

The duplicate message fix closed a long-standing reconnect bug. When the Baileys WebSocket dropped and reconnected, pending message queues were re-driven by both the old and new connection, occasionally producing duplicate replies. The 2026.4.22 release adds an in-memory "active delivery claim" that prevents two reconnects from racing the same queue entry. The visible fix is that the agent stops sending the same reply twice during flaky internet.

Session keys and per-peer isolation on WhatsApp

Every inbound WhatsApp message arrives at the gateway with a small metadata bundle: the linked-device account that received it, the peer phone number that sent it, and whether the message landed in a DM or a group. The session manager turns that bundle into a deterministic string. A typical WhatsApp DM produces a key like wa:+15551234567; a WhatsApp group produces something like wa:120363045xxx@g.us.

The default dmScope setting is per-channel-peer, which means each peer phone number gets its own session and its own conversational memory. A patient asking about an appointment last Tuesday gets the context from last Tuesday's chat, not the context from a different patient who happens to share the same dental practice. Group chats always get their own session regardless of dmScope, because every group is a first-class room with its own membership and its own privacy expectations.

This is not optional infrastructure. Regulated industries on WhatsApp (dental, legal, medical, financial) need the agent to never confuse one peer's data with another's, and the session key boundary is what enforces that at the routing layer before the model ever sees the prompt. The deeper mechanics are spelled out in the session keys explainer if you want the full algorithm.

Comparison: WhatsApp AI deployment patterns in 2026

Four patterns dominate the 2026 landscape for putting an AI agent on a WhatsApp number. They are not interchangeable, and the right pick depends on whether you care more about volume, conversation quality, compliance, or operational simplicity.

  
    
      Pattern
      Connection
      Typical use case
      2026 reality
    
  
  
    
      OpenClaw + Baileys
      Multi-device companion (QR scan)
      Conversational AI worker, agency multi-tenant
      Default for replies; no per-message fees
    
    
      Meta Cloud API + classic flow
      Official Business API, template-driven
      Notifications, OTP, scheduled broadcasts
      Required for >50K/day; AI-chat banned since Jan 15, 2026
    
    
      Cloud API via BSP middleware
      Twilio/MessageBird/360dialog wrappers
      Mid-volume mixed flows
      Same Meta policies; vendor markup added
    
    
      WhatsApp MCP server
      Local Baileys + MCP protocol
      Personal assistant for one operator
      Great for solo use, weak for multi-tenant ops
    
  

The two paths agencies should evaluate first are OpenClaw + Baileys for inbound conversational work and Meta Cloud API for outbound transactional templates. Many production deployments end up running both side by side: OpenClaw answers replies, Cloud API ships the appointment reminder that started the thread. The two systems do not conflict because each handles a different message direction.

When OpenClaw on WhatsApp is not for you

The multi-device path is not the right answer for every shop, and pretending otherwise is how agencies blow up a client account.

You need to send more than 50,000 outbound templates per day. Multi-device companion devices are rate-limited the same way a human user is. If your job is high-volume broadcast, you want Cloud API or a BSP, and you should accept the per-template pricing as the cost of the channel.

Your client is contractually required to use the Meta Business Platform. Some healthcare networks, financial institutions, and regulated brands require an officially provisioned Cloud API number with a green checkmark. Multi-device companion devices do not appear in the Business Platform dashboard, and audits will fail.

You need WhatsApp's official Flows or list-message templates. Some interactive UI primitives (carousels, native list pickers, structured forms) are exclusive to the Cloud API. Baileys can fake a lot of the experience with text and quick replies, but if your client demands the native flow widgets, the multi-device path will not deliver them.

You are unwilling to keep the linking phone online. Multi-device sessions stay alive even with the linking phone off, but the phone must occasionally come online to refresh the link. If the linked phone is permanently unavailable (lost, stolen, formatted), the session breaks and the next inbound message fails until you scan a new QR. Cloud API has no such constraint.

Outside those four conditions, OpenClaw on WhatsApp is the path most agencies should default to in 2026.

Frequently asked questions

Does OpenClaw need a verified Meta Business account?

No. The OpenClaw WhatsApp channel uses the multi-device protocol via Baileys, which is the same protocol WhatsApp Web uses. You scan a QR code with the phone that owns the number and the gateway acts as a linked companion device. There is no Meta Business onboarding, no display name approval, and no template review process. If the phone has WhatsApp installed and can scan a QR code, the agent can connect.

Will Meta ban my number for using a Baileys-based AI agent?

Meta does not publish enforcement criteria for multi-device clients, and bans do happen for accounts that send spam or template-style broadcast through the personal protocol. Conservative usage (replying to inbound messages, normal conversational pace, no mass cold outreach) has a long track record of staying inside the lines. Use the channel for replies, not for cold blast campaigns, and keep the per-day outbound volume in human ranges.

What happens if the Baileys session expires?

The gateway logs an "auth state invalidated" line and pauses the channel. Run openclaw whatsapp link --print-qr again, scan with the same phone, and the channel resumes. Existing conversations keep their session keys, so the agent's memory survives the relink. The persistent volume mount on sessionPath is what makes this safe; without it, every restart is a relink.

Can one OpenClaw gateway run multiple WhatsApp numbers?

Yes. Add one channel block per number, each with its own sessionPath directory and its own systemPrompt. The session key includes the linked-device account identifier, so messages from different numbers route to different conversations and never overlap. This is the standard pattern for agencies hosting fifteen or twenty client numbers on a single gateway.

How does this interact with Anthropic's Claude Code Channels?

Anthropic's Claude Code Channels shipped in early 2026 with native Discord and Telegram support, but no first-party WhatsApp connector. The community is expected to build one through the open MCP standard, and OpenClaw already covers the WhatsApp side directly today. If your stack is Claude Code centric and you need WhatsApp now, OpenClaw is the bridge that exists.

Does the agent see WhatsApp end-to-end encrypted messages in plaintext?

Yes, by design. A linked companion device is an authorized endpoint inside WhatsApp's E2EE model, just like the user's phone or laptop. Messages are decrypted on the OpenClaw host and the model receives plaintext. This is what enables the agent to reply at all. It also means the host should be treated as sensitive infrastructure: full disk encryption, locked-down SSH, and no sharing the box with untrusted workloads.

The smallest piece of infrastructure your agency probably underestimates

WhatsApp is not exotic. It is the channel a billion small businesses already use, and the AI agent that lives inside it does not have to be exotic either. One config block, one QR scan, one persistent volume, and a Claude model on the back end is genuinely the whole thing. The work that used to take a Meta Business onboarding, a template approval queue, and a BSP contract now takes fifteen minutes. The 2026 changes to the official Cloud API made multi-device the right default for conversational AI, and OpenClaw 2026.4.22 made it cleaner to operate.

If you would rather skip the VPS and the QR refresh dance and just hand a working WhatsApp number to a client, Kyra runs the OpenClaw gateway, the Baileys session storage, and the per-client isolation on a managed host with the same defaults this guide describes. Industry-specific starting templates are ready for dental practices and real estate agencies, and the underlying primitives are documented in the OpenClaw WhatsApp reference and the OpenClaw repository on GitHub. WhatsApp is where most of your clients' customers already are. Putting an AI worker there in 2026 is no longer the hard part of the project.

---

### OpenClaw Session Keys Explained: How One Gateway Keeps 24 Channels Separate in 2026

- URL: https://kyra.conversionsystem.com/blog/openclaw-session-keys-explained-2026
- Published: 2026-04-20
- Category: AI Infrastructure
- Read time: 12 min

Last updated: April 20, 2026

An OpenClaw session key is the unique identifier the gateway attaches to every incoming message so the AI agent knows which conversation it belongs to and which context to load. Each channel, each user, and each thread produces a different key. That one string is what lets a single OpenClaw gateway run a WhatsApp DM, a Slack thread, a Discord guild channel, and the browser WebChat widget side by side without any of them reading the others' memory. OpenClaw 2026.4.15 ships 24 supported channel integrations out of the box, and the session key is the quiet machinery that keeps every one of those conversations isolated from every other one.

  Key takeaways
  
    Session keys are generated automatically from channel, user, and thread metadata. You normally never type one by hand, but you can override the format for multi-tenant apps.
    The session.dmScope setting has three modes (main, per-channel-peer, per-peer) and each one trades continuity for isolation differently.
    Groups and threads always get their own session. That part is not configurable and it is the right default for group privacy.
    The OpenClaw gateway listens on port 18789 by default and resolves the session key before it ever dispatches a message to the agent runtime.
    A prefixed session key like kyra-user-42 is how multi-tenant platforms separate thousands of clients without spinning up a container per user.
    Session keys solve the isolation problem at the context layer, not the infrastructure layer. One daemon, one agent pool, N clean conversations.
  

What an OpenClaw session key actually is

Every message that arrives at an OpenClaw gateway carries a small bundle of metadata: which channel adapter it came from (Slack, WhatsApp, Discord, Matrix, and so on), which account sent it, and whether it landed in a DM, a group, or a thread. The gateway runs that bundle through its session manager, which deterministically produces a string. That string is the session key.

The key does two jobs at once. First, it tells the agent runtime which conversation's memory, history, and working files to load before replying. Second, it tells the gateway where to route the reply once the agent is done thinking. A message with key slack:T12345:C98765:U00001 loads and writes to a different context than a message with key wa:+15551234567, even if both messages come from the same human being on the same day.

This is the core primitive behind OpenClaw's "one gateway, many channels" promise. Without the session key, every incoming message would either collapse into a single global chat (disastrous for privacy) or require a separate daemon per channel (disastrous for cost and ops). The key is the cheap, composable middle path.

Why one gateway needs many sessions

A realistic deployment for a small agency looks like this. One OpenClaw gateway on a VPS. One Anthropic API key. Fifteen client businesses, each with their own Slack workspace, WhatsApp number, or web widget. Hundreds of end users across those businesses. Every one of those users expects the AI to remember their last conversation and nothing else.

If you tried to do that with fifteen separate daemons, your ops surface would triple: fifteen systemd units to patch, fifteen logs to tail, fifteen sets of secrets to rotate. If you tried to do it with one daemon and one global conversation, the first time User A's medical question leaked into User B's session you would lose every client at once.

Session keys let you stay in the middle. One daemon still runs. One agent pool still handles the load. But every message lives in its own keyed context, and the gateway enforces the boundary before the model ever sees the prompt. This is the same pattern Cowork, Claude Code's IDE extensions, and most production OpenClaw deployments use, and it is exactly how the broader OpenClaw gateway architecture was designed to scale.

The anatomy of a session key

Session keys are plain strings, so you can look at them in a log file and read them. The default format encodes the channel, the workspace or server, the channel or room, and the user, joined with colons. Real examples from a production gateway:

  slack:T09ABC123:C04XYZ789:U07DEF456 — a DM inside a specific Slack workspace
  discord:guild_742:channel_9981:user_4412 — a DM-style channel in a Discord guild
  wa:+15551234567 — a WhatsApp DM, keyed by the peer's phone number
  telegram:chat_5839201 — a Telegram chat
  webchat:anon_9d2f1c — an anonymous browser visitor on the WebChat widget
  matrix:!room_abc:server:@user:server — a Matrix room with full federated identity

You can also prefix your own namespace. This is how multi-tenant hosts structure keys. A self-serve platform might force every agent's session keys to begin with tenant-42-, giving you strings like tenant-42-slack:T09ABC:C04XYZ:U07DEF. The gateway treats the prefix as opaque, but it means one tenant's Slack DM and a different tenant's identical Slack DM never collide in the session store.

The specific format is documented in the OpenClaw gateway security reference. You can override it in config.yml under session.keyFormat, but the default is good enough for 95% of deployments and you should only change it if you know exactly why.

dmScope: three ways to isolate direct messages

Groups and threads are easy. Every group gets its own session key, every thread gets its own key, and nobody disagrees about whether that is correct. DMs are the interesting case. A user who chats with the same AI persona across Slack, WhatsApp, and WebChat might want those three surfaces to feel like one continuous conversation, or they might want them to feel like three airtight rooms. OpenClaw exposes this choice through the session.dmScope setting, which has exactly three valid values.

dmScope: main (the default)

In main mode every direct message the user sends, from any channel, resolves to the same shared session. The agent remembers everything they ever said in a DM regardless of which app delivered it. This is the warmest setting and the right one for a single-user personal assistant: the OpenClaw founder chatting with their own agent from Slack in the morning and WhatsApp in the evening does not want two separate memories.

dmScope: per-channel-peer (the secure default)

In per-channel-peer mode every unique combination of channel plus sender produces its own session. Slack DMs from user Alice get one session. WhatsApp DMs from the same Alice get a different one. Discord DMs get a third. This is the right default when you deploy for other people rather than yourself. An employee who messages the AI from work Slack and personal WhatsApp probably expects those to feel like two different contexts, and HR auditors definitely expect it.

dmScope: per-peer

In per-peer mode the channel is dropped from the key and only the peer identity matters. Alice's DMs collapse into one session across every channel of the same type, but different channel types still stay separate. This is the rarest setting and is usually only useful when the underlying identity system is strong enough to trust across surfaces, for example a Matrix-federated deployment where every user has one canonical MXID.

The practical rule: start with per-channel-peer for any multi-user deployment, switch to main for a personal bot, and only reach for per-peer when a specific compliance or UX requirement demands it.

Groups, threads, and why they always get their own session

The dmScope setting only governs direct messages. Group channels and threads are treated as first-class conversations in their own right, every time, without exception. A Slack channel #general gets one session key shared by every member. A Slack thread inside that channel gets a different session key shared by every participant in the thread. The same rule applies to Discord threads, Matrix spaces, Telegram group chats, and WhatsApp groups.

This design matters for two reasons. First, groups have social norms: a message posted to #engineering is readable by the whole team, so the agent can safely load prior #engineering context when replying. Second, threads are mini-rooms: they exist specifically because the participants want a scoped side conversation, and the agent should respect that scope. OpenClaw bakes that expectation into the session layer so the developer cannot accidentally violate it.

The upshot is that dmScope only needs to worry about direct messages because everything else already has the right behavior by default. If you need to see the full precedence order the gateway uses to derive a key, the security reference lays out the full resolution algorithm.

Step-by-step: configure session keys for a multi-client agency

Here is a minimal working setup that takes an OpenClaw gateway from vanilla install to multi-client isolation. It assumes you have already installed OpenClaw and bound it to its default port. The walkthrough targets OpenClaw 2026.4.15 or later.

1. Confirm the gateway is running.

openclaw gateway status
# expected: listening on :18789, 0 active sessions

2. Open your gateway config.

$EDITOR ~/.openclaw/config.yml

3. Set the DM scope and enable tenant prefixing. This is the single most important block for multi-client deployments. Every session key will now carry the tenant prefix, and DMs will isolate per channel plus peer.

session:
  dmScope: per-channel-peer
  keyFormat: "tenant-{tenant_id}-{channel}:{server_id}:{channel_id}:{user_id}"
  ttlDays: 30
  storage: sqlite
  storagePath: ~/.openclaw/sessions.db

4. Add two channel adapters for a first smoke test. Slack and WebChat are the fastest to wire up because neither requires a verified phone number.

channels:
  - type: slack
    botToken: ${SLACK_BOT_TOKEN}
    signingSecret: ${SLACK_SIGNING_SECRET}
    tenant_id: "42"
  - type: webchat
    publicUrl: https://chat.example.com
    tenant_id: "42"

5. Reload the gateway without dropping active sessions.

openclaw gateway reload

6. Send one test message from each surface. Slack DM first, then a WebChat visit from an incognito browser. Then run the session listing command.

openclaw sessions list --tenant 42
# expected: two rows, one slack session, one webchat session,
# both prefixed with tenant-42-

7. Verify isolation with a one-line check. Ask the agent in Slack what it remembers. Ask the same question in WebChat. The answers must differ. If they don't, your dmScope is wrong or your keyFormat is collapsing the two keys into one.

That is it. The config file and the seven commands above are a complete per-tenant session isolation setup. Every client you add later is one more entry in the channels: array with a different tenant_id, plus whatever per-client skills or memory you want to layer on top. For the broader walkthrough that includes MCP connectors and Claude Skills, those two companion guides pick up where this one stops.

Comparison: session isolation strategies across AI gateways

Session isolation is not an OpenClaw-specific idea, but the shape of the implementation varies widely across the ecosystem. Here is how the four patterns most teams will encounter in 2026 actually differ.

  
    
      Strategy
      What gets isolated
      Typical implementation
      Cost per extra user
    
  
  
    
      Session key (OpenClaw)
      Conversation context, memory, working files
      One daemon, keyed context store
      A few KB of session state
    
    
      Container per tenant
      Everything, including CPU and filesystem
      One container per client, orchestrator on top
      50–200 MB RAM minimum per user
    
    
      Thread per request (classic chatbot)
      Nothing beyond one turn
      Stateless API call, memory pushed to a DB
      Round-trips to external memory on every turn
    
    
      Claude Managed Agents
      Sandboxed execution, long-running sessions
      Anthropic-hosted infrastructure (public beta, April 2026)
      Per-session metered pricing
    
  

Session keys give you conversation-level isolation at near-zero marginal cost, which is why they dominate multi-tenant self-hosted deployments. Containers give you infrastructure-level isolation, which matters when you run untrusted code and you are willing to pay the RAM bill. Anthropic's Claude Managed Agents, launched in public beta on April 8, 2026, sit at the other end of the spectrum: you pay Anthropic to host the isolation boundary and stop worrying about it yourself. Most Kyra-style deployments pick the session key path because it keeps the stack thin.

When session keys aren't the right answer for you

Session keys solve context isolation. They do not solve everything, and there are three situations where reaching for a different primitive is the correct move.

You run untrusted code on behalf of users. If your agent executes arbitrary Python or shell that a user can control, context isolation is necessary but not sufficient. You want a real sandbox boundary: a container, a firecracker VM, or Claude Managed Agents. Session keys stop a user from reading another user's conversation, but they do not stop one user's shell command from reading the daemon's filesystem.

You have strict regulatory isolation requirements. Some HIPAA, GDPR, or FedRAMP deployments contractually require that two tenants' data never share a process, period. Session keys share a process by design. If your compliance officer is in the conversation, plan for a container-per-tenant or a dedicated gateway-per-tenant architecture from day one.

You want a stateless protocol. The MCP working group is actively evolving the protocol toward stateless requests in 2026 for the same load-balancer and horizontal-scaling reasons that session state creates. If your deployment is behind a round-robin load balancer across many stateless server instances, OpenClaw's keyed session store assumes the load balancer is sticky (or assumes a shared session DB). Pick accordingly, and check the MCP 2026 roadmap if you are making long-range architecture bets.

For the other 80% of deployments (agencies with tens to low hundreds of clients, founders running a personal assistant across every app they use, GHL resellers adding an AI worker per client), session keys are boringly correct.

Frequently asked questions

How does OpenClaw decide which agent handles a given session key?

The gateway reads a routing: block in the config that maps channel patterns to agent names. A route like slack:T09ABC*:* -> agent-acme sends every Slack message from workspace T09ABC to the Acme agent, regardless of which user or channel triggered it. Session keys are derived first, then routing picks the agent, then the agent loads the session's context. All three steps happen before the model sees a single token.

Can I share a session key across two users intentionally?

Yes. Set session.keyFormat to a value that ignores the user portion, for example "{channel}:{server_id}:{channel_id}". Every user in that channel will then write to the same context. This is useful for a shared workspace assistant where the team expects a continuous thread of memory. Use it on purpose, not by accident.

What happens when a session key is rotated or expired?

Sessions have a TTL (default 30 days). When the TTL lapses, the gateway archives the conversation and returns a cold context for the next message with that key. The user sees the AI "forget" the old conversation. To keep the memory forever, set session.ttlDays to 0 and budget for your session store to grow over time.

Can I look up a session key from a user's name?

Yes, through the admin API. openclaw sessions find --channel slack --user U07DEF456 returns the full session key and metadata. This is how support teams pull up a user's conversation history when they file a ticket. Access is gated by the gateway's admin token, not by the agent, so users cannot read each other's sessions even if they have agent-level tool access.

Does this work the same way in a Cowork deployment?

Yes. Cowork adds a workspace layer on top of the gateway, but the session key primitive is the same. A Cowork workspace contributes one more segment to the key (the workspace ID), which is how a single physical gateway can host many Cowork tenants without any of them seeing each other's sessions.

How do I debug a session that is routing to the wrong agent?

Run the gateway with OPENCLAW_LOG=debug and watch the "session resolved" and "routing matched" log lines for the incoming message. Nine times out of ten the issue is a missing or wrong tenant_id on the channel, or a routing pattern that matches more aggressively than you expected. The third time it is an outdated cache, which openclaw gateway reload fixes.

The small idea that makes multi-channel AI practical

Session keys are a small idea and they do a surprising amount of work. They are why one OpenClaw daemon can run 15 clients without cross-talk. They are why a personal assistant can feel continuous across Slack and WhatsApp or crisply separated, depending on a single config line. They are why agencies can charge recurring fees for AI workers without standing up a container farm to host them. The format is five minutes of reading in the docs, and the implications show up in every architectural decision downstream.

If you want the OpenClaw gateway, the session key defaults, and the per-client isolation wired up for you rather than configured by hand, Kyra runs the whole stack on your own domain with tenant prefixes and per-channel-peer isolation turned on from the first install. For industry-specific starting points there are ready-made worker templates for dental practices and real estate agencies, and for the architectural picture behind all of this, the OpenClaw repository on GitHub and the gateway security reference are the two most useful places to keep bookmarked. Session keys are the kind of primitive you only notice when they fail, and when they are set up right you should never have to think about them again.

---

### Write Your First Claude Skill for OpenClaw: A 2026 Step-by-Step Guide

- URL: https://kyra.conversionsystem.com/blog/write-your-first-claude-skill-openclaw-2026
- Published: 2026-04-19
- Category: AI Infrastructure
- Read time: 12 min

Last updated: April 19, 2026

A Claude Skill is a folder of markdown and optional scripts that teaches an AI agent one specific, repeatable workflow and loads itself into context only when the agent detects a matching request. Anthropic announced Skills on October 16, 2025, published the 32-page Complete Guide to Building Skills for Claude on January 29, 2026, and made Agent Skills an open standard on December 18, 2025. The same format now works across Claude Code, Claude Desktop, Cursor, and OpenClaw. By February 28, 2026 the public OpenClaw registry (ClawHub) was already carrying 13,729 community-built skills, and the format had become the default way agent builders package reusable capability.

  Key takeaways
  
    A Skill is a directory with a SKILL.md file. The file has YAML frontmatter (name, description, optional allowed-tools) and markdown instructions below it.
    The description field decides whether the agent ever loads the skill. Write it as a clear "what it does + when to use it" sentence.
    Skills use progressive disclosure: only the frontmatter metadata sits in context until a matching request triggers a full load. Fifty skills cost roughly the same idle tokens as one.
    OpenClaw loads skills from three roots, in precedence order: bundled, local (~/.openclaw/skills), and per-workspace. Later roots override earlier ones by name.
    Agent Skills became an open standard in December 2025. A skill written for Claude Code drops into OpenClaw, Cursor, or any compatible runtime with no rewrite.
    Skills are the right answer for repeatable procedural knowledge. Tools, MCP servers, and sub-agents solve different problems and are explained below.
  

What a Claude Skill actually is

Before Skills, agent builders had three choices. You could stuff long instructions into the system prompt and pay the token cost on every turn. You could write a custom tool for each workflow and maintain the glue code forever. Or you could train a sub-agent per task and juggle routing logic by hand.

A Skill replaces all three for the case that covers most real work: the agent already knows how to do the thing in principle, but it needs the house-specific recipe. How does our agency format a client onboarding report? Which fields go into a GHL appointment webhook? What is the exact SQL migration pattern this codebase uses? Those answers are short, procedural, and worth reusing. That is a Skill.

The physical artifact is almost embarrassingly simple. A folder. Inside the folder, a file named SKILL.md. Optional subfolders named scripts/, references/, and assets/ for bundled code, docs, and templates. That is the whole spec.

The agent indexes every skill's frontmatter at startup. When a user request matches the description, the agent pulls in the full markdown and any referenced files, runs the procedure, and unloads it again. Progressive disclosure keeps idle context cheap and the skill library big.

Skills vs tools vs MCP servers vs sub-agents

Agent builders new to the ecosystem routinely confuse these four primitives. They solve overlapping but distinct problems, and picking the wrong one creates architecture pain that is hard to undo later.

  
    
      Primitive
      What it encodes
      When to reach for it
      Cost model
    
  
  
    
      Skill
      Procedural knowledge and templates
      A repeatable workflow your agent does often, with house-specific steps
      Metadata-only when idle, full markdown on trigger
    
    
      Tool
      A deterministic function the agent can call
      Reading a file, sending a message, running a shell command
      Schema in context always, invoked on demand
    
    
      MCP server
      A remote bundle of tools and resources from one data source
      Any external integration you want to share across agents
      Subprocess or HTTP service, schema injected into context
    
    
      Sub-agent
      A separate agent loop with its own context window
      Long research, parallel exploration, isolated failure domains
      A fresh full conversation per invocation
    
  

The mental model that keeps teams unstuck: a Skill tells the agent how to do something. A Tool or MCP server lets the agent do something. A Sub-agent delegates the whole job to a new context. Most production OpenClaw workspaces end up using all four, but they start with Skills because Skills are cheap, versionable, and readable in plain markdown.

For a deeper look at the MCP half of that picture, see the companion post on MCP connectors in OpenClaw.

The SKILL.md anatomy

Every Skill file has two parts. YAML frontmatter on top, markdown body below. The frontmatter tells the agent when to load the skill. The body tells the agent what to do once loaded.

A minimal example for a fictitious "summarize-client-call" skill:

---
name: summarize-client-call
description: Turns a raw call transcript into a structured client summary with action items. Use this whenever the user pastes a call transcript, attaches an audio transcript file, or asks to summarize a client conversation.
allowed-tools: [read_file, write_file]
---

# Summarize Client Call

## When to use
The user has a call transcript (text, VTT, or pasted dialogue) and wants a structured summary for their CRM.

## Steps
1. Read the transcript. Identify the client name, the agency owner's name, and the call date.
2. Extract the top three outcomes the client wanted from the call.
3. Extract every action item, with the owner and a due date if mentioned.
4. Write the result to `./out/<client>-<YYYY-MM-DD>.md` using the template in `./references/template.md`.

## Output contract
The summary file must contain five H2 sections: Attendees, Context, Outcomes, Action items, Next call.

Three details matter more than they look. First, description is the only string the agent sees before loading the skill, so it has to contain both what the skill does and the exact trigger phrases a user might say. Anthropic's own skill-creator plugin writes the description last for exactly this reason. Second, allowed-tools is a safety rail: even if the agent has twenty tools available, only the listed ones can fire while this skill is active. Third, the body uses short numbered steps, not prose. Agents follow checklists reliably. They rewrite prose.

Step-by-step: write and install your first OpenClaw skill

This walkthrough produces a working skill in about fifteen minutes on a fresh OpenClaw install. Any Linux, macOS, or WSL box with Node.js 22 or newer will do.

1. Install OpenClaw and start the gateway if you have not already:

npm install -g @openclaw/cli
openclaw init
openclaw gateway start

2. Create the skill directory inside your local skills root:

mkdir -p ~/.openclaw/skills/summarize-client-call/references
cd ~/.openclaw/skills/summarize-client-call
touch SKILL.md references/template.md

3. Write the SKILL.md file. Paste the example from the anatomy section above, or use the scaffold the CLI ships with:

openclaw skills new summarize-client-call \
  --description "Turns a raw call transcript into a structured client summary." \
  --allowed-tools read_file,write_file

The CLI writes a templated SKILL.md, an empty references/ directory, and an assets/ folder for any images or sample files.

4. Add a template to references. Drop a markdown file at references/template.md that holds the exact output shape you want. The agent will read it at runtime, so you avoid duplicating the template inside the main SKILL.md:

## Attendees
- <name> (role)

## Context
One paragraph.

## Outcomes
1. ...
2. ...
3. ...

## Action items
| Owner | Action | Due |
| --- | --- | --- |
| ... | ... | ... |

## Next call
Date, channel, goal.

5. Validate the skill locally:

openclaw skills validate summarize-client-call

The validator checks frontmatter syntax, warns on missing description triggers, and runs a dry-load against the current gateway.

6. Reload the daemon and list loaded skills:

openclaw gateway reload
openclaw skills list --agent my-first-agent

You should see summarize-client-call in the output, tagged with its source path. The gateway only loads frontmatter at this point, so startup time is unaffected.

7. Fire it. Send the agent a message that matches the trigger in the description:

openclaw chat my-first-agent \
  "Here's a call transcript from today with Acme Dental. Summarize it."

The agent matches the description, loads the full SKILL.md plus the template, produces the summary, and writes it to ./out/acme-dental-2026-04-19.md. If you watch the gateway logs you will see a single "skill:load" event and a "skill:unload" right after the response is returned.

That is the full loop: write markdown, validate, reload, call. No compile step, no deployment, no redeploys across clients. The same skill dropped into a teammate's workspace works identically because the contract is files on disk.

Progressive disclosure and why the token math works

Progressive disclosure is the reason skills scale. At agent startup, OpenClaw reads every SKILL.md and indexes only the YAML frontmatter. Typical frontmatter is under 200 tokens. Fifty skills cost around 10,000 idle tokens in context, which is tolerable on any modern model.

When the agent decides to invoke a skill, it pulls the full markdown, any files referenced from the body, and any scripts the body explicitly calls out. Once the turn ends, that material is dropped from the working context. The next unrelated turn starts clean.

This is the same pattern that makes big codebases tractable for agents: keep the index in memory, load the file only when the query matches. It is also why Anthropic's engineering team has said Skills pair cleanly with MCP rather than replacing it. MCP handles discovery and tool invocation for live systems; Skills handle the procedural knowledge the agent applies once a tool is available.

How OpenClaw resolves bundled, local, and workspace skills

OpenClaw loads skills from three roots. Understanding the precedence rules prevents a whole class of "why is my skill not firing" support tickets.

  Bundled skills ship inside the @openclaw/cli package. You can point the gateway at a pinned bundled directory using OPENCLAW_BUNDLED_PLUGINS_DIR. These are the defaults everyone gets on install.
  Local skills live at ~/.openclaw/skills. Anything here applies to every agent on the machine and overrides a bundled skill of the same name. Use this for your own reusable workflows.
  Workspace skills live at ~/.openclaw/agents/<agentId>/skills. Anything here applies only to that agent and overrides both bundled and local skills with the same name. Use this for client-specific or project-specific customizations.

The agent logs the source path next to each loaded skill so you can tell at a glance which copy won. For white-label agencies running thirty clients on one gateway, the common pattern is: bundled for the baseline, local for the agency house style, workspace for per-client variants. The file layout itself enforces isolation.

To see how this slots into the larger gateway picture, read what OpenClaw actually is.

Testing, versioning, and shipping a Skill

Skills are plain files in a git repository. Treat them as code. A team shipping skills to clients usually ends up with a shape like this:

  A skills/ directory at the root of the agency repo, one subdirectory per skill.
  A tests/ directory next to each skill holding sample inputs and expected outputs. The CLI can run these against a lightweight agent loop: openclaw skills test summarize-client-call.
  Pull requests that change a skill's behaviour bump the version field in frontmatter. The registry surfaces this version in the UI, so clients can see when a skill changes underneath them.
  A CI job that validates every skill on every commit. The validator is fast (under a second per skill) so it stays in the pre-commit hook too.

For publishing, the open OpenClaw skills documentation describes packaging for ClawHub. For private teams, the simplest approach is to keep skills in a shared git repo and install them into ~/.openclaw/skills via openclaw skills add <git-url>. The CLI handles the clone, checkout, and symlink steps so that updates are a single git pull.

When a Skill is not the right tool

Skills are great at one thing and bad at the opposite of that thing. They encode reusable procedural knowledge. They are the wrong answer when:

  The work is one-off. If you only need the agent to do it once, put the instructions in the chat. A Skill adds friction when it will never be reused.
  The work is mostly a live system call. If the core of the job is "hit this API and summarize the response", write an MCP server or a tool. Skills should orchestrate tools, not replace them.
  The work branches deeply based on state. If your procedure has more than two or three decision points that require loading different playbooks, promote each branch to its own skill and route between them with a top-level skill, or reach for a sub-agent.
  The instructions are longer than the task. If a skill's SKILL.md is 3,000 words long and the output is a two-line confirmation, the skill is doing too much. Break it up or convert the instructions to a real tool.

The honest test: if you would not write a Notion doc to describe the procedure, you probably should not write a skill either. Skills reward clarity, not volume.

Frequently asked questions

Do Claude Skills work in OpenClaw without modification?

Yes. Skills follow the Agent Skills open standard that Anthropic published in December 2025. The SKILL.md contract is identical across Claude Code, Claude Desktop, Cursor, and OpenClaw. A skill you wrote against Claude Code drops into ~/.openclaw/skills and loads without changes. Skills that call allowed-tools only work if the host agent actually has those tools, but that is a runtime concern, not a format one.

How many skills can one agent have before context costs blow up?

In practice, several hundred. Frontmatter is usually 100 to 200 tokens per skill. Progressive disclosure means the full body never sits in context unless the skill fires. Anthropic's own skills repo at the time of writing ships dozens of skills and runs on every Claude plan. The practical ceiling is discoverability: past 30 or 40 skills, the agent starts to have a harder time picking the right one, so invest in tighter description fields rather than a bigger library.

Is it safe to install community skills from ClawHub or agentskills.io?

Treat them like npm packages. Read the SKILL.md before installing. Check what scripts the skill ships and what allowed-tools it requests. The gateway isolates tool execution inside per-agent workspaces, but a skill that runs a shell script still runs with your user's permissions on the host. For anything touching a client workspace, fork the skill into your own repo, audit it, and pin the commit.

What is the difference between a Skill and a prompt template?

A prompt template is a string you paste into a conversation. The agent has no awareness of it beyond that one turn. A Skill is a persistent artifact the agent decides to load based on request intent, can reference files from, and can scope tool access inside. Skills are to prompts what functions are to snippets.

Can a Skill call another Skill?

Indirectly. A skill's markdown can instruct the agent to invoke another skill by name, and the agent will match the second skill's description and load it for the next turn. There is no direct programmatic import. This is intentional: the agent stays in charge of which skills are active, which keeps context use predictable and matches how progressive disclosure is supposed to work.

How do I debug a skill that is not firing?

Three checks. First, run openclaw skills list --agent <id> --verbose and confirm the skill loaded at startup. Second, check that the user message contains at least one phrase from the description field. Third, rerun the turn with OPENCLAW_LOG=debug to see the skill-match decision trace. Most "not firing" issues trace back to a description that is too vague or too narrow.

Ship the skill, then pick your next one

Skills are the lowest-friction way to teach an agent a repeatable job that lives in your head today. Write the markdown, validate it, reload the gateway, call it once, correct the description if the match was weak. That loop takes an afternoon. The compounding effect is that a year of those afternoons produces a skill library that becomes your agency's actual operating system.

If you want the OpenClaw gateway, the bundled skill library, and the ClawHub integration wired up for you rather than built by hand, Kyra deploys the whole stack on your own domain in about ten minutes. For industry-specific starter skills, see the dental practice template. For the broader integration picture, Anthropic's own Claude Code skills documentation, the open-source anthropics/skills repository, and the OpenClaw skills reference are the three sources worth bookmarking first. Skills are a format, not a feature, and formats outlast the companies that invent them.

---

### MCP Connectors Explained: How OpenClaw Plugs 10,000+ Tools Into Any AI Agent (2026 Guide)

- URL: https://kyra.conversionsystem.com/blog/mcp-connectors-openclaw-guide-2026
- Published: 2026-04-17
- Category: AI Infrastructure
- Read time: 14 min

Last updated: April 17, 2026

A Model Context Protocol (MCP) connector is a standardized bridge that lets any AI model read from, write to, and trigger actions on any external tool using a single common wire format. You write the connector once. Every MCP-aware agent can use it. That is why, by March 2026, Anthropic reported over 10,000 public MCP servers and 97 million monthly SDK downloads across Python and TypeScript. MCP turned the mess of one-off API integrations into a lingua franca for agentic software, and OpenClaw plugs directly into the entire ecosystem on day one.

  Key takeaways
  
    MCP is an open protocol Anthropic released in November 2024 and donated to the Linux Foundation (via the Agentic AI Foundation) in December 2025.
    An MCP connector exposes tools, resources, and prompts from a data source to any MCP-aware agent over stdio or HTTP/SSE transport.
    OpenClaw is both an MCP client (it consumes MCP servers) and an MCP server (it exposes OpenClaw tools to outside clients like Claude Desktop).
    Setup is under 10 minutes: add the server to mcp.json, restart the daemon, list the new tools, and the agent can call them.
    Security boils down to allowlists, deny-lists, OAuth where supported, and running untrusted servers in sandboxed containers.
    By March 2026, every major AI provider shipped MCP support. Gartner projects 40% of enterprise apps will embed task-specific AI agents by end of 2026.
  

What Model Context Protocol actually is

Before MCP, every AI integration looked the same: someone wrote a custom adapter for their favorite tool, then rewrote it three more times for OpenAI, Anthropic, and whatever internal framework was fashionable that quarter. The adapter would break when the API changed, nobody shared code, and every agency running AI for clients maintained its own private stack of tape-and-glue integrations.

MCP replaced that pattern with a single specification. The protocol defines three primitives a server can expose: tools the model can call, resources the model can read, and prompts the model can receive as structured templates. A compliant client speaks this protocol once. Any compliant server plugs in without extra work. Agents, IDEs, assistants, and orchestration layers can all consume the same server.

The protocol itself is transport-agnostic. The two official transports are stdio (the client spawns the server as a subprocess and talks to it over standard input and output) and Streamable HTTP with optional SSE for streaming. Stdio is the default for local tools. HTTP is the default for anything remote or multi-tenant.

Why MCP exploded in 2025 and 2026

MCP shipped in November 2024 as a small Anthropic experiment. By April 2025 it was at 8 million cumulative SDK downloads. June 2025 hit 35 million monthly. March 2026 crossed 97 million monthly downloads and 10,000 active public servers. That is one of the fastest adoption curves any developer protocol has ever recorded.

Three forces drove the growth. First, every major AI vendor adopted MCP in the same 12-month window. Second, Anthropic donated the protocol to the Agentic AI Foundation in December 2025, a directed fund under the Linux Foundation co-founded with Block and OpenAI. That removed vendor-control fears and unlocked enterprise procurement. Third, the MCP spec kept shipping: new maintainers like Clare Liguori and Den Delimarsky joined in April 2026, and the 2026 roadmap prioritizes stateless transport, scalable session handling, and multi-server discovery via Server Cards.

The practical effect on agencies and builders: the integration work that used to be the hard part is now table stakes. Hundreds of SaaS vendors ship official MCP servers. Thousands more are community-maintained. Notion, Linear, GitHub, Stripe, Slack, Postgres, Google Drive, Figma, Salesforce, Zendesk, Intercom, Sentry, AWS, Cloudflare, every major CRM, and most databases have a public MCP server you can wire up in minutes.

How OpenClaw uses MCP (client and server)

OpenClaw is unusual in the ecosystem because it plays both roles. As an MCP client, an OpenClaw agent can consume any external MCP server — the same way Claude Desktop or Cursor does. You add the server to an mcp.json file, the gateway boots it, and the tools appear in the agent's tool list automatically. The agent decides when to call them.

As an MCP server, OpenClaw exposes its own internal tools to outside clients. There are two common patterns. First, a loopback bridge lets background Claude CLI runs reach the same tools the main OpenClaw agent has. Second, a remote bridge (the freema/openclaw-mcp project is one implementation) exposes routed channel conversations over MCP so that a Claude Desktop user can talk to a self-hosted OpenClaw assistant with OAuth2 authentication. See the OpenClaw MCP CLI docs for the current reference.

Being both sides of the protocol matters because it lets a single OpenClaw gateway act as the integration layer for a whole agency. Thirty clients, each with their own session, all share the same pool of MCP connectors. You maintain one CRM connector, not thirty. For how this fits into the broader gateway design, read what OpenClaw actually is.

Stdio vs HTTP/SSE: picking a transport

Most teams get this wrong the first time. Stdio connectors spawn a subprocess on the same machine as the gateway and communicate over pipes. They are fast, private, and have no network surface. But they cannot be shared across gateways, they die when the gateway restarts, and they are awkward to scale horizontally. HTTP/SSE connectors run as independent services anywhere on the network. They can be load-balanced, authenticated with OAuth, and shared by every gateway in an estate. But they add latency and require actual ops work.

The rough rule:

  Use stdio for local-first tools: reading the filesystem, shelling out to a CLI, hitting a local Postgres, scraping a page, controlling a local Chrome instance.
  Use HTTP/SSE for anything multi-tenant, anything with OAuth, anything that already runs as a service, and anything that needs to be reachable from more than one OpenClaw node.

The MCP maintainer team is actively reshaping the HTTP transport in the 2026 roadmap to behave correctly behind load balancers and survive server restarts without losing session state. That work is tracked in the official MCP roadmap.

Step-by-step: add your first MCP server to OpenClaw

This walkthrough wires up a common example: a GitHub MCP server so the agent can open issues, review pull requests, and search code. It takes about eight minutes on a fresh OpenClaw install.

1. Install OpenClaw if you have not already. Any Linux, macOS, or WSL box with Node.js 22 or newer works:

npm install -g @openclaw/cli
openclaw init
openclaw gateway start

2. Open your agent workspace. Every agent has a workspace directory at ~/.openclaw/agents/<agentId>. The MCP config lives there:

cd ~/.openclaw/agents/my-first-agent
ls mcp.json 2>/dev/null || echo '{ "servers": {} }' > mcp.json

3. Add the server block. Edit mcp.json to register the GitHub server:

{
  "servers": {
    "github": {
      "transport": "stdio",
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": {
        "GITHUB_PERSONAL_ACCESS_TOKEN": "ghp_your_token_here"
      }
    }
  }
}

4. Reload the agent. The gateway picks up new connectors on reload:

openclaw agents reload my-first-agent
openclaw mcp list my-first-agent

You should see the GitHub tools appear: search_code, create_issue, list_pull_requests, and a dozen more.

5. Test in the agent. Send a message over any connected channel:

"Find open issues in openclaw/openclaw labeled good-first-issue and summarize them."

The agent will call github.list_issues with the label filter, read the results, and reply with a summary. That is a full MCP round trip on a live production connector.

6. Lock down permissions. By default the agent can call any tool the MCP server exposes. For safety in client-facing deployments, add an allowlist in the agent's config:

openclaw agents edit my-first-agent \
  --allow "github.search_code,github.list_issues,github.list_pull_requests" \
  --deny "github.delete_repository"

Now the agent can read your repositories but cannot delete anything. The full permission model is documented in the OpenClaw docs, and if you want an end-to-end example using the same pattern for a law firm, see our law firm AI worker template.

MCP connectors vs custom plugins vs raw API calls

Agencies shipping AI to clients have three realistic ways to plug an agent into an external tool. Each has a place. Picking the wrong one wastes weeks.

  
    
      Dimension
      MCP connector
      Custom OpenClaw plugin
      Raw API call from agent
    
  
  
    
      Setup time
      Minutes (if server exists)
      Hours to days
      Hours (prompt-engineered)
    
    
      Reusable across agents
      Yes, any MCP client
      Yes, within OpenClaw
      No, per-prompt
    
    
      Works outside OpenClaw
      Yes (Claude Desktop, Cursor, Windsurf)
      No
      No
    
    
      Schema discovery
      Automatic
      Defined in plugin manifest
      Manual, in prompt
    
    
      Auth pattern
      OAuth / env / headers
      OpenClaw vault
      Whatever you inject
    
    
      When to use
      Standard SaaS tools
      Agency-specific business logic
      One-off prototypes
    
  

The short answer most agencies land on: MCP connectors for anything vendors already ship (CRMs, databases, dev tools, file storage), custom plugins for the one or two pieces of logic that are actually your moat, and raw API calls only for quick prototypes. The longer you run an agency, the more of your stack ends up as stock MCP connectors — and that is the healthy outcome.

The MCP connectors every agency should know

Out of 10,000 public servers, a short list covers 80% of real agency work. These are the ones worth wiring up on day one:

  GitHub. Issues, pull requests, code search, file contents. The starter connector for every technical team.
  Postgres / MySQL / SQLite. Read-only query access to your operational database. Pair with a strict allowlist.
  Slack. Post messages, read channels, react to threads. Not a chatbot replacement — an action surface.
  Stripe. Customer lookup, charge history, subscription status. Gold for billing support agents.
  Notion or Linear. Plan work, file tickets, update docs. Most agencies pick one.
  Google Drive or Dropbox. Read-only access to client docs for retrieval-augmented answering.
  Browser (via Chrome MCP). Fills the long tail when no first-party server exists. OpenClaw v2026.3.23 shipped major reliability fixes to the browser attach path.
  GHL / HubSpot / Salesforce. Contact records, deal pipelines, workflow triggers. Read our GHL AI worker guide for the end-to-end pattern.

Wiring all eight takes an afternoon. The same eight connectors, shared across every client agent on the same gateway, is the backbone of a production agency stack in 2026.

Security: the part most people get wrong

MCP gives an agent tools. Tools execute real actions. Treating that casually is how data leaks happen. Four rules cover most of the risk.

1. Allowlist by default, not denylist. It is easier to add a tool an agent needs than to remove one it should not have. Start with zero tools allowed, add the specific tool names the use case requires.

2. Never run untrusted servers in the main gateway process. A community MCP server is arbitrary code. If you do not know the author, run it in an isolated container with its own network, no filesystem access outside a scratch directory, and no access to gateway secrets.

3. Use OAuth where the server supports it. Static tokens in mcp.json are fine for local dev, terrible for production. OAuth means the user authorizes exactly the scopes the agent needs, and tokens can be revoked without redeploying.

4. Monitor every tool call. OpenClaw writes every MCP call to the agent's session JSONL. Forward those logs somewhere you will actually read them. The cheap version is a nightly cron that greps for high-risk tool names (anything with delete, write, or transfer) and emails the summary.

For regulated industries this is not optional. A dental practice running an AI receptionist over MCP connectors needs a documented permission posture before the HIPAA officer will sign off. Our guide on AI for dental practices covers that specific posture.

When MCP is not the right choice

MCP is not the answer to every integration question. Three cases where reaching for it is the wrong instinct.

Real-time, sub-second control loops. MCP is request-response with some streaming. If you are building a trading system or a realtime game agent, the protocol overhead will dominate. Talk directly to the underlying API.

High-volume background data movement. If the job is moving a million rows a night from Salesforce to Snowflake, write an ETL pipeline. Do not have an agent iterate through it.

Logic that is genuinely yours. If the "tool" is 40 lines of your company's pricing logic, that belongs in a custom OpenClaw plugin or a small internal service. MCP adds no value for something only your agent will ever call.

A clean mental model: MCP is the right choice when the tool already exists or could reasonably exist as a general-purpose product. It is the wrong choice when the tool is really just a piece of your business logic wearing a protocol costume.

Frequently asked questions

Is MCP production-ready in 2026?
Yes, with care. The spec is stable, the major SDKs are reliable, and enterprise adoption is real. The sharp edges are still in the HTTP transport (stateful sessions behind load balancers) and server discovery, both of which are explicitly on the 2026 roadmap. Stdio-based local connectors have been production-ready since early 2025.

Do I need OpenClaw to use MCP?
No. MCP is an open protocol. Claude Desktop, Cursor, Windsurf, and dozens of other clients consume MCP servers. OpenClaw is specifically useful when you want one gateway serving many clients and channels, rather than a single developer using a single IDE.

Can one MCP server be used by many agents at once?
Yes. That is the point. A single GitHub or Postgres MCP server on your network can be a tool source for every agent on every gateway in your estate. Stdio servers are per-process; HTTP servers are shared.

How does MCP handle authentication?
Three patterns. Static tokens in environment variables (simplest, least secure). Header-based auth on HTTP transport (good for service-to-service). OAuth 2.1 with the DPoP extension being proposed in 2026 (best for user-authorized access). Pick based on who the agent is acting on behalf of.

What happens if an MCP server goes down?
The tool disappears from the agent's available tool list until the server returns. A well-written agent handles the missing tool gracefully — it falls back to asking the user or using a different path. A poorly written prompt pretends the tool is still there and hallucinates the result. Test the degraded-mode path before shipping.

Does MCP work with non-Anthropic models?
Yes. The protocol is model-agnostic. OpenAI, Google, Mistral, and open-weights models via tool-calling all consume MCP servers through the same OpenClaw client layer. The "Model" in "Model Context Protocol" refers to any language model, not Anthropic specifically.

The bottom line

Two years ago the hard part of agentic software was the integration work. Every agency had its own sprawling collection of API adapters, and every client engagement started with another adapter rewrite. In 2026 that problem is mostly solved. MCP turned integration into a shared commons. OpenClaw turned it into something an agency can operate at scale: one gateway, one pool of connectors, every client isolated, every tool reusable.

If you want to run this stack yourself, the openclaw/openclaw repo is the right starting point, and the official MCP roadmap is worth subscribing to. If you want the same architecture without the ops work, Kyra runs a hosted OpenClaw gateway with a curated MCP connector library, session isolation per client, and permission presets that match real agency workflows. Our solo plan is free during beta so you can wire up your first three MCP connectors without a credit card. Our team has deployed this pattern across agencies ranging from two-person shops to 50-client white-label operators, and the playbook works the same in both places.

---

### 6 Things an OpenClaw AI Agent Can Do That a Chatbot Can't (2026 Guide)

- URL: https://kyra.conversionsystem.com/blog/openclaw-agent-vs-chatbot-capabilities
- Published: 2026-04-17
- Category: AI Infrastructure
- Read time: 13 min

Last updated: April 17, 2026

An OpenClaw AI agent is an open-source, self-hosted AI worker that can do six things a typical chatbot cannot: browse the live web, read and write files, execute real code, search memory from past conversations, fire emails and webhooks, and delegate complex work to sub-agents. These capabilities ship out of the box — no plugins, no custom code, no orchestration layer. This guide explains each one, shows a real example of what it enables, and walks through the setup in under fifteen minutes.

  Key takeaways
  
    A chatbot responds to text. An AI agent has six tool categories that let it actually do the work.
    OpenClaw ships with 60+ built-in tools covering web, files, code, memory, actions, and multi-agent coordination.
    The community ClawHub registry adds thousands of additional skills contributed by agencies running this in production.
    Setup takes under 15 minutes on any machine that runs Node.js 22 or later.
    The architecture is MIT-licensed, self-hosted, and works with Claude, GPT, Gemini, and 50+ other models.
  

Why "chatbot" and "AI agent" are different things

Most software called "AI" in 2026 is still a text interface in front of a language model. You type a question, the model replies, the interaction ends. The model does not open a browser, does not touch your files, does not run any code, and does not remember you the next time you come back. It is a chatbot.

An AI agent is the same language model connected to tools. Those tools let the model take actions in the real world: read a webpage, write a file, execute a query, send an email, call another agent. Without tools, a language model is a very articulate pattern-matcher. With tools, it is a worker.

OpenClaw is an open-source framework that gives any language model the full toolkit. It runs as a single daemon on your hardware, connects to messaging channels like WhatsApp, Slack, and Discord, and routes user messages to the agent along with access to all its tools. For the full architecture, see our guide on what OpenClaw actually is.

The six capability categories below are what that toolkit enables. Every one is built in. You do not install anything to get them.

1. Browse the web and pull live data

The first thing a real AI agent does that a chatbot cannot: look at the live internet. Language models are frozen at their training cutoff. OpenClaw agents have a built-in browser tool powered by Chromium plus a web-search tool that integrates with more than ten search providers — Google, Bing, Brave, Kagi, SerpAPI, and others. The agent can open a page, read its content, fill out a form, click a button, or extract structured data.

Real examples:

  Pull today's competitor pricing from three different websites, compare, and report the deltas every morning at 6am.
  Verify a stat before citing it in a reply. If the claim is wrong, the agent says so.
  Read a news article the user just linked, summarize the relevant points, and pull out action items.
  Check if a lead's business is still operating before the sales team calls.
  Scrape a product page and extract specs, warranty info, and shipping details.

The key difference from a generic "search plugin": the agent chooses when to search, what to search, and how to interpret the result. It does not blindly forward your query to Google. It reasons about whether live data is needed, fetches it, and integrates it into the answer.

2. Read, write, edit, and analyze files

An OpenClaw agent has four file tools — read, write, edit, and apply_patch — that operate on files inside a workspace directory on your machine. The workspace is the agent's cwd: a folder you control, where your files live, and where the agent's outputs land.

Real examples:

  Summarize a 40-page PDF contract and flag clauses that need legal review.
  Clean up a messy CSV export — drop duplicates, fix encoding, normalize column names — and save a clean version.
  Rewrite a long proposal document in a different tone, saving as a new version without touching the original.
  Compare two versions of a document and list the diffs.
  Analyze a folder of raw data files and generate a weekly report.

Critically, your files stay on your disk. OpenClaw is self-hosted: nothing is uploaded to a cloud service for processing. The agent reads from local paths and writes to local paths. That matters for regulated businesses — dental, legal, medical, financial — where shipping patient or client data to a third-party SaaS violates compliance requirements.

3. Run code and return real results

This is the capability that separates a chatbot from an analyst. OpenClaw agents have two execution tools: one for shell commands, one for sandboxed Python. When the agent needs to compute something, it does not guess. It writes code, runs it, and returns the actual output.

Real examples:

  Run a Python script against a CSV and return summary stats — mean, median, percentiles — with actual numbers.
  Execute a SQL query against your database and report the result.
  Call your API endpoint and tell you what response it received, including the status code and headers.
  Transform a spreadsheet with pandas and write the cleaned output to disk.
  Test a regex against your sample inputs to confirm it catches what you want.

The difference this makes is qualitative, not quantitative. When a chatbot says "your conversion rate is approximately 3.2%", that number was generated by pattern-matching — it may or may not be correct. When an agent says it, the number came from running the calculation. The agent can show you the code it ran, the data it read, and the output it got.

For sensitive commands, the execution tool respects a permissions system. You can restrict the agent to a specific set of commands, a specific working directory, or require explicit approval for anything destructive. The OpenClaw documentation covers tool allow and deny lists in detail.

4. Search memory from past conversations

The single biggest reason chatbots feel broken is that they forget you the second the conversation ends. OpenClaw agents do not. Every conversation is stored as a session file in JSONL format at ~/.openclaw/agents/<agentId>/sessions/<sessionId>.jsonl. The agent has two memory tools — memory_search and memory_get — that search across all past sessions.

Real example conversation:

Customer (one week later): Hey, I'm back.
Agent: Welcome back, Sarah. Last time you were weighing the 3-bedroom versus the 2-bedroom with the garage. Did you decide?

That is not a scripted flow. The agent searched its memory for prior sessions with this user, found the relevant thread, extracted the open decision, and brought it up. Session management, memory search, and recall all work out of the box.

There is also an optional active memory sub-agent that runs before every reply, searches memory for anything relevant, and surfaces it so the main agent can reference it naturally. Six prompt modes are available — balanced, strict, recall-heavy, precision-heavy, contextual, preference-only — so you can tune how aggressively memory gets injected.

Older turns in very long sessions are automatically compacted into summary entries (the built-in compaction system), which keeps the token bill down without losing continuity. The architecture guide goes deeper into this.

5. Send emails, book calendar slots, and trigger webhooks

Where most chatbots stop, an agent finishes the job. After a conversation ends, an OpenClaw agent can fire any number of post-conversation actions: email a summary, book a calendar slot, drop a message in another channel, fire a webhook into Zapier or n8n, update a CRM, schedule a follow-up via the built-in cron tool.

Real workflow:

  A patient texts a dental practice at 9pm asking about insurance and availability.
  The agent answers the insurance question (by searching the uploaded coverage document), offers three appointment times based on Google Calendar availability, and confirms the booking.
  After the conversation ends, the agent emails the office manager a summary, creates the appointment on the shared calendar, tags the patient in the CRM as "new-booking," and schedules a reminder for the next morning.

None of that requires an external automation tool. It is all built into the OpenClaw messaging, webhook, and cron systems. For the full automation stack, see the docs on hooks, cron, and tasks. For how this workflow plays out in a real dental practice deployment, see our dental AI guide.

6. Call sub-agents to split complex tasks

One agent can only hold so much context at once. When a task has multiple parallel parts, OpenClaw lets the main agent spawn specialist sub-agents, each with its own context window, tool allowlist, and workspace. The main agent orchestrates. The sub-agents focus.

Real workflow:

A patient asks a dental practice AI: "Does my insurance cover a cleaning this week, and can anyone do an emergency filling today?"

Instead of juggling three lookups in one context, the main agent delegates:

  Sub-agent 1: Look up insurance coverage across the practice's 12 supported providers.
  Sub-agent 2: Check today's calendar for any emergency slot.
  Sub-agent 3: Pull the current pricing for cleanings and fillings, including any active promotions.

All three run in parallel. They report back. The main agent composes a single clean reply with all three answers. This pattern scales: an agency that handles support, sales, and ops from one inbox can use sub-agent routing to give each lane its own specialist brain without running three separate chatbots.

Each sub-agent has isolated context (no cross-contamination), its own tool permissions (the support sub-agent does not have access to the billing API), and its own session (audit trails stay clean).

The 6 capabilities side-by-side

  
    
      Capability
      Tools Used
      What It Enables
    
  
  
    
      Browse the web
      browser, web_search
      Live data, competitor checks, stat verification
    
    
      Read/write/edit files
      read, write, edit, apply_patch
      Document work on your own disk, self-hosted
    
    
      Run code
      exec, code_execution
      Real computation, SQL, API calls, data transforms
    
    
      Search memory
      memory_search, memory_get
      Returning-customer recognition, persistent context
    
    
      Actions & webhooks
      message, cron, webhook tools
      Email, calendar, CRM updates, scheduled follow-ups
    
    
      Sub-agents
      sessions_spawn, subagents
      Parallelized specialist workflows
    
  

How to set up an OpenClaw agent with all 6 capabilities in 15 minutes

The full six-capability toolkit is available the moment the gateway starts. There is no "install tool pack" step. Here is the minimum-viable setup.

Step 1. Install OpenClaw

npm install -g openclaw@latest

Requires Node.js 22.14 or later. The recommended version is Node 24.

Step 2. Run the onboarding wizard

openclaw onboard --install-daemon

This prompts for a model provider API key (Anthropic, OpenAI, Google, OpenRouter, Ollama, and fifty-plus others supported), creates your workspace at ~/.openclaw/workspace, and installs the daemon as a system service (launchd on macOS, systemd on Linux, Scheduled Task on Windows).

Step 3. Verify the built-in tools

openclaw cli tools list

You should see 60+ tools listed. All six capability categories are represented: browser, file I/O, exec, memory, messaging, sub-agent spawning, plus tools for media generation, cron, and more.

Step 4. Open the dashboard and test

openclaw dashboard

This launches the Control UI at http://127.0.0.1:18789. Send the agent a test message that exercises a tool — "What's the current weather in San Francisco?" forces a web search. "Run ls ~/Documents" forces a shell exec (with appropriate permissions). You will see the tool invocations in the agent's reasoning trace.

Step 5. Connect a messaging channel

Telegram is the fastest channel to configure: create a bot with @BotFather, paste the token into ~/.openclaw/openclaw.json under channels.telegram.botToken, add your username to channels.telegram.allowFrom, and restart the gateway. The agent will start replying to your messages from your phone within seconds.

Frequently asked questions

Do I need to know how to code to use this?

No. The built-in tools work through natural-language instructions. You tell the agent what you want ("pull the competitor prices this morning") and it picks the right tools to accomplish it. You only need to edit config files — no programming.

What does this cost to run?

OpenClaw itself is free (MIT licensed, open source). The only cost is the model API token usage for your chosen provider. A busy agent running on Claude Sonnet typically costs $5–$30 per month in API fees at moderate conversation volume. If you use a local model via Ollama, that cost is zero.

Is this secure for regulated industries?

The gateway binds to loopback (127.0.0.1) by default, meaning only your local machine can talk to it. For remote access, the recommended pattern is Tailscale or an SSH tunnel, not public internet ingress. Files stay on your disk. Sessions stay on your disk. The full security model uses MITRE ATLAS terminology and is documented in the project's threat model.

Can I run multiple agents with different tool permissions?

Yes. OpenClaw supports multi-agent deployment on one gateway. Each agent has its own workspace, its own tool allow/deny lists, its own sessions, and its own routing bindings. A customer support agent can have browser and memory access but no shell exec. A personal productivity agent can have full access. They run on the same gateway without cross-contamination.

How does this compare to building on the OpenAI Assistants API or similar?

OpenAI's Assistants API gives you a hosted agent runtime tied to OpenAI's models and infrastructure. OpenClaw gives you a self-hosted agent runtime with model-agnostic design — you can swap between Claude, GPT, Gemini, local models, and others with a config change. You control where the data lives, what tools are available, and how the agent is deployed.

What about the skills ecosystem?

The six capabilities above are built-in tools. On top of those, OpenClaw supports Skills — markdown instruction files that teach the agent repeatable workflows. The community ClawHub registry hosts thousands of published skills covering ads management, CRM automation, research workflows, and more. Skills load per-workspace, per-user, or globally, and you can write your own by dropping a markdown file in the skills folder.

When OpenClaw is probably overkill for you

Not every use case needs a self-hosted agent. OpenClaw is the wrong choice if:

  You just need a FAQ chatbot on one page of your website. A lighter tool will do.
  You have no model API key and no interest in getting one.
  You are not comfortable editing a config file or running a command-line install.
  You need zero-setup, click-and-deploy with no configuration at all.

For that last group, a managed platform that wraps OpenClaw makes more sense than running it directly. That is the space Kyra occupies: agencies use it to deploy isolated OpenClaw containers for each of their clients without touching infrastructure. Each client gets their own agent, their own workspace, their own memory — and the agency manages everything from one dashboard. The architecture is identical; the operational overhead is zero.

The bigger point

The gap between "AI chatbot" and "AI worker" is exactly this toolkit. A chatbot responds. An agent executes. The difference is not the model. It is what the model can reach.

OpenClaw ships the toolkit free and open source. You get six capability categories, sixty-plus tools, and a community registry of thousands of additional skills — all in fifteen minutes of setup.

Most businesses are still running chatbots. The ones that switched to agents are closing tickets, booking appointments, qualifying leads, and running reports while their teams sleep. The technology gap is real. The setup gap is small.

Want the full breakdown of what OpenClaw is before you install it? Start with our guide on what OpenClaw actually is. Ready to deploy it for clients without the DevOps burden? Start with Kyra Solo — free, no credit card, first agent live in under two minutes.

External references: OpenClaw on GitHub (MIT licensed) · Official OpenClaw documentation · Model Context Protocol (MCP) specification · Anthropic Claude documentation.

---

### What Is OpenClaw? The Open-Source AI Gateway That Connects Every Messaging App to Your AI Agent

- URL: https://kyra.conversionsystem.com/blog/what-is-openclaw-ai-gateway-explained
- Published: 2026-04-16
- Category: AI Infrastructure
- Read time: 13 min

Last updated: April 17, 2026

  Key takeaways
  
    OpenClaw is an open-source, MIT-licensed AI gateway that runs as a single daemon on your hardware.
    It connects 24+ messaging channels (WhatsApp, Telegram, Slack, Discord, Signal, iMessage, Teams, Matrix, and more) to a single AI agent.
    Supports 50+ model providers including Claude, GPT, Gemini, Ollama, and OpenRouter.
    Setup takes under 10 minutes on any machine with Node.js 22 or later.
    Your data stays on your hardware. No vendor lock-in.
  

Most of the AI chat tools on the market today are closed black boxes. You sign up, you hand over your data, you pay per seat, and you pray the vendor doesn't change their pricing next quarter. Your conversations sit on someone else's server. Your customers get answers from the same shared infrastructure as everyone else. If the service goes down, your business goes down with it.

There is a different path. It is called OpenClaw, and it is quietly becoming the backbone of serious AI deployments in 2026. This guide explains what OpenClaw actually is, what problem it solves, how the architecture works, and exactly how to set it up — even if you have never run a server before.

By the end of this article, you will understand why agencies, solo operators, and regulated businesses are moving off shared chatbot platforms onto self-hosted AI gateways — and why OpenClaw is the one they are choosing.

What Is OpenClaw? The One-Sentence Definition

OpenClaw is an open-source, self-hosted AI gateway that runs as a single daemon on your machine or server and connects your messaging apps — WhatsApp, Telegram, Slack, Discord, Signal, iMessage, Microsoft Teams, Matrix, and more — to an AI agent that you fully control.

That definition packs a lot in, so let us unpack it.

Open-source: MIT licensed. The code is on GitHub at github.com/openclaw/openclaw. You can read every line. You can fork it. You can contribute. There is no vendor to go out of business and take your bot with them.

Self-hosted: OpenClaw runs on your hardware. Your laptop, a Mac Mini in a closet, a Raspberry Pi, a cheap VPS, a dedicated server, a Docker container — wherever you want. Your data lives in ~/.openclaw/ on your disk. Nothing is sent to a cloud service unless you explicitly configure it.

AI gateway: This is the important word. A gateway is not a chatbot. It is not a workflow automation tool. It is a bridge — a single process that sits between your messaging channels on one side and an AI model on the other, routing messages, managing sessions, invoking tools, and keeping state.

Single daemon: One background process. One port. One config file. You do not have to stitch together seven different services, manage a Kubernetes cluster, or learn four new languages. You install Node, run one command, and it is live.

What OpenClaw Replaces

OpenClaw is the most interesting when you look at what it makes obsolete. Four categories of tools disappear the moment you deploy it.

1. Zapier-Style Automation for AI

Most businesses glue AI into their stack with Zapier, Make, or n8n. It works — barely — until you hit a rate limit, a per-task fee, or a broken trigger at 2am. OpenClaw has built-in cron jobs, event hooks, background tasks, and multi-step task flows. They run inside the gateway, tied to your agent, with no per-task billing and no external scheduler to fail.

2. Shared Chatbot Platforms

If you are using a SaaS chatbot tool, your client's conversations are likely sitting on a shared server with thousands of other businesses. Their data, their prompts, their patient intake forms — mixed with a random e-commerce store in another industry. For regulated businesses (dental, legal, medical, financial), this is not a feature. It is a liability. OpenClaw runs on your machine. Every client can have their own isolated container with their own data, their own personality, and their own knowledge base.

3. Custom-Built Bots for Every Channel

If you have ever tried to ship a WhatsApp bot, a Telegram bot, a Slack bot, and a Discord bot as separate projects, you know the pain. Four codebases. Four auth flows. Four message formats. Four deploy pipelines. OpenClaw collapses this into one process. You write the agent once. It speaks every channel. When a message comes in on Telegram, the reply goes to Telegram. When it comes in on Slack, the reply goes to Slack. The routing is deterministic and configurable.

4. Prompt Chains That Break

Handcrafted prompt chains are brittle. One new product update, one odd customer question, one edge case — and the whole chain falls apart. OpenClaw agents use persistent sessions, structured memory, built-in tool use, and automatic context compaction. The agent remembers what it learned yesterday. It can search the web. It can read files. It can write to a CRM. It does not forget your customer after every message.

24+ Channels, One Gateway

OpenClaw ships with first-party integrations for the channels real businesses use every day. Here is the list as of 2026.

Built-in channels: WhatsApp (via Baileys with QR pairing), Telegram (via bot token — the fastest setup), Discord (with guild routing, threads, and slash commands), Slack (via the Bolt SDK in socket mode or HTTP webhooks), Signal (via signal-cli bridge), iMessage (via Mac or BlueBubbles), Google Chat, IRC, and WebChat (an embeddable widget for any website).

Bundled plugin channels: Matrix (with end-to-end encryption support), Microsoft Teams (with full Graph API integration), Feishu, LINE, Mattermost, Nextcloud Talk, Nostr, QQ Bot, Synology Chat, Tlon, Twitch, Zalo, and Zalo Personal.

That is more than twenty-four channels. Every one of them runs from the same gateway. You add a channel by editing a config file or running a CLI command. You do not write a new bot for each one.

And the replies route intelligently. If a customer messages your WhatsApp number, the reply goes to WhatsApp. If a teammate pings your agent in a Slack thread, the reply goes into that thread. Session state is isolated per channel, per group, per user — so conversations never cross-contaminate.

The Core Architecture in Plain English

You do not need to be a systems engineer to use OpenClaw, but it helps to understand the moving parts. Here is the picture.

The Gateway: a single long-lived daemon. It opens one port (default 18789, loopback only by default) and listens for WebSocket connections from channels, clients, and nodes. It is the single source of truth for sessions, routing, and channel connections.

The Agent Runtime: embedded inside the gateway. When a message arrives, the gateway hands it to the agent runtime, which assembles a context, calls the language model, invokes tools if needed, streams the response back, and persists the conversation transcript.

The Workspace: a directory on your disk (default ~/.openclaw/workspace). Inside it, a handful of markdown files define how your agent behaves. SOUL.md is the personality file — tone, voice, boundaries. AGENTS.md is operating rules and memory. USER.md is who you are. TOOLS.md is your notes on how to use specific tools. These files inject into the agent's context at the start of every new session.

Sessions: every conversation is a session, stored as a JSONL file. Sessions reset on a schedule (default 4am local) or when they go idle. Old tool results are pruned in memory to save tokens. When context fills up, older messages are summarized into a single compact entry — a process called compaction — so the conversation can continue indefinitely.

Tools: the agent has more than sixty built-in tools. It can execute shell commands. It can read and write files. It can search the web through ten different providers. It can drive a Chromium browser. It can send messages across channels. It can generate images, audio, and video. It can spawn sub-agents for complex tasks. You control which tools it can use through simple allow and deny lists.

Skills: reusable markdown instruction files that teach the agent specific workflows. Write a skill once — "generate a weekly client report" — and the agent will follow those steps forever. Skills load from six locations with clear precedence, so you can ship skills per-workspace, per-user, or bundled with the install.

How to Set Up OpenClaw in 10 Minutes

This is the part everyone wants. Here is the exact, step-by-step installation for a typical developer or power user. Total time, start to first message: under ten minutes.

Step 1. Check Your Node Version

OpenClaw recommends Node 24, but it works on Node 22.14 or later for compatibility. Check what you have:

node --version

If you do not have Node, install it from nodejs.org or via a version manager like nvm. This is the only real dependency.

Step 2. Install OpenClaw Globally

npm install -g openclaw@latest

This puts the openclaw CLI on your path. Takes about thirty seconds on a reasonable internet connection.

Step 3. Run the Onboarding Wizard

openclaw onboard --install-daemon

The wizard walks you through three things. First, it asks for an API key from a model provider. Claude from Anthropic is the default recommendation, but OpenClaw supports more than fifty providers including OpenAI, Google Gemini, Mistral, Groq, DeepSeek, OpenRouter, and local models via Ollama. Pick whichever you have credentials for.

Second, it creates your workspace at ~/.openclaw/workspace and seeds it with template files. Third, it installs the daemon as a service so it starts automatically when your computer boots. On macOS this is launchd. On Linux it is systemd. On Windows it is a Scheduled Task.

Step 4. Customize Your Agent's Personality

Open ~/.openclaw/workspace/SOUL.md in any text editor. Replace the default content with who you want your agent to be. For example:

You are a professional customer service assistant for a dental
practice. You are warm, clear, and patient. You answer questions
about scheduling, insurance, and services. You never speculate
about medical conditions. If a patient sounds distressed, you
offer to connect them with a human immediately.

You respond in short sentences. You avoid jargon. You confirm
every appointment time and date twice before booking.

Save the file. The next conversation your agent has will use this personality.

Step 5. Add Your First Channel

Telegram is the fastest channel to set up because it only requires a bot token. Create a bot by messaging @BotFather on Telegram and following the prompts. Copy the token it gives you.

Open ~/.openclaw/openclaw.json and add:

{
  "channels": {
    "telegram": {
      "enabled": true,
      "botToken": "YOUR_TOKEN_HERE",
      "allowFrom": ["your_telegram_username"]
    }
  }
}

The allowFrom list is your first line of defense. Only listed users can message your agent. Remove it later once you have pairing or broader access policies configured.

Step 6. Restart and Message Your Agent

openclaw gateway restart

Open Telegram. Find your bot. Say hello. You should get a reply within a couple of seconds, in the voice you defined in SOUL.md, coming from your own hardware, using your own API key.

That is a working AI gateway. From here you can add more channels, more tools, more skills, and more agents. The gateway is already doing the heavy lifting.

Step 7. Open the Dashboard

openclaw dashboard

This opens the Control UI at http://127.0.0.1:18789/. It is a browser dashboard for managing sessions, inspecting logs, configuring channels, and chatting with your agent directly. For most power users this becomes the main interface alongside the CLI.

Common Questions About OpenClaw

Is OpenClaw really free?

Yes. The code is MIT licensed. There is no subscription, no per-message fee, no paid tier. The only thing you pay for is the AI model you connect it to — and you bring your own API key. If you use a local model through Ollama, even that cost disappears.

What does it run on?

Any machine that can run Node.js. Many users run it on a Mac Mini, an old laptop, or a cheap virtual server. Memory footprint is modest. The gateway itself is lightweight; the heavy lifting is the model call, which happens on the provider's infrastructure or your local GPU.

Is it secure?

The gateway binds to loopback by default, meaning only your local machine can talk to it. For remote access, the recommended pattern is Tailscale or an SSH tunnel rather than public ingress. Every channel connection uses pairing — a challenge-signed device identity that must be explicitly approved on first connect. Non-local connections still require explicit approval. The full security model uses MITRE ATLAS terminology and is documented in the project's threat model.

Can I run multiple agents on one gateway?

Yes. Multi-agent routing is a first-class feature. Each agent gets its own workspace, its own sessions, its own skills, and its own routing bindings. You can point different channels at different agents, or split one channel by guild, role, or peer. One gateway can host a support agent, a sales agent, and a personal assistant at the same time without any cross-contamination.

What about enterprise deployments?

OpenClaw includes a delegate architecture for agents that act on behalf of organizational principals. It supports three capability tiers — read-only, send-on-behalf, and autonomous — each with hardening requirements including tool allow and deny lists, sandbox isolation, and audit trails. It integrates with Microsoft 365 and Google Workspace with minimum-privilege delegation scopes.

How does it handle memory?

Session transcripts live on your disk as JSONL. Daily memory summaries can be written to markdown files in the workspace. An optional active memory sub-agent surfaces relevant memories before each reply. Compaction automatically summarizes older turns when context fills up. Prompt cache pruning reduces token cost without losing context. All of this works out of the box.

When OpenClaw Makes Sense, and When It Does Not

Self-hosted AI is not the right choice for every situation. Here is the honest take.

OpenClaw makes sense if:

You care about data sovereignty — regulated industries, sensitive intake forms, confidential business workflows
You want multi-channel AI without writing four separate bots
You have more than a handful of clients or teams and need isolation between them
You want predictable costs — pay for the model tokens you use, not per-seat licensing
You want to build skills and automation your agent runs repeatedly
You are comfortable editing a config file or running a CLI command

OpenClaw might be overkill if:

You only need a basic chatbot on a single channel and have never managed a server
You do not have any API keys and do not want to get any
You want a zero-setup, click-and-deploy experience with no configuration

For that second group, there is an easier path.

The Easier Path: Deploy OpenClaw Without Managing Infrastructure

OpenClaw vs. alternative AI deployment paths

  
    
      Approach
      Data location
      Channel coverage
      Per-seat pricing
      Lock-in risk
    
  
  
    
      ChatGPT / Claude web app
      Vendor cloud
      Web only
      Yes
      High
    
    
      OpenAI Assistants API
      Vendor cloud
      Custom integration per channel
      Usage + model cost
      High (API tied to one vendor)
    
    
      Shared SaaS chatbot
      Vendor cloud, shared infra
      Channel dependent
      Yes
      Medium
    
    
      OpenClaw (self-hosted)
      Your hardware
      24+ built-in channels
      None
      None (MIT licensed)
    
  

OpenClaw is powerful. It is also, for most agency owners and non-technical operators, more setup than they want to do for every client. Installing Node, editing config files, managing daemons, paying for a VPS, renewing TLS certificates — it adds up. For agencies who want the OpenClaw architecture without the infrastructure work, managed platforms exist that wrap this runtime in a complete service layer — per-client isolation, ready-to-configure industry templates, integrated billing, and an onboarding flow measured in minutes rather than hours. The underlying technology is identical to self-hosted OpenClaw.

Start Here

If you are technical and curious, install OpenClaw. It is free, it is open source, and ten minutes of your time gets you an agent that runs on your hardware and speaks through every channel you use.

If you are an agency owner or business operator who wants the OpenClaw architecture without the infrastructure work, start with Kyra Solo. It is free to try, no credit card required, and your first AI worker goes live in under two minutes.

Either way, the era of shared chatbot platforms is ending. The era of self-hosted, agent-native, multi-channel AI is beginning. The tools are open source, the architecture is proven, and the setup is fast. The only question is whether you want to run it yourself or let a platform run it for you.

Want to read more? See our guide on building a white-label AI business or the GoHighLevel AI worker setup guide, or our breakdown of the 6 capabilities an AI agent has that a chatbot doesn't.

External references: OpenClaw on GitHub (MIT licensed) · Official OpenClaw documentation · Model Context Protocol (MCP) specification · Anthropic Claude documentation.

---

### GoHighLevel AI Worker: The Complete Guide for GHL Agencies (2026)

- URL: https://kyra.conversionsystem.com/blog/ghl-ai-employee-complete-guide
- Published: 2026-02-23
- Category: GHL Integration
- Read time: 13 min

Last updated: April 17, 2026

A GoHighLevel AI worker is an autonomous AI agent connected to a GHL sub-account via a Private Integration Token that reads inbound conversations across all seven GHL channels (SMS, WhatsApp, Instagram, Facebook Messenger, Live Chat, email, and Google My Business), replies within 60 seconds, books appointments, updates CRM tags and pipeline stages, and escalates urgent or complex situations to the agency team. This guide walks through exactly how it works, how to deploy one in 10 minutes, and how agencies typically price it.

  Key takeaways
  
    Connection uses GHL Private Integration Tokens — no marketplace listing, no OAuth app, no review process required.
    All seven GHL conversation channels are covered from a single unified inbox.
    Typical retainer pricing: $500 (restaurant, basic) to $2,000 (real estate, premium) per month per client.
    Proactive outreach is the highest-impact feature: the AI contacts new leads within ~60 seconds of creation.
    Every conversation auto-updates GHL tags, pipeline stages, and contact notes — no manual CRM maintenance.
  

If you're running a GoHighLevel agency, you've already heard the buzzword: "AI." GHL has started integrating AI features, and every agency is trying to figure out what to do with them. But most GHL agencies are doing AI wrong — and leaving serious money on the table.

This guide explains how to add a real AI worker to every GHL sub-account — not a workflow automation, not a keyword chatbot, but a conversational AI that responds to every inbound SMS within 60 seconds, 24/7.

The Difference Between GHL Automations and a Real AI Worker

GHL automations are powerful. You can trigger SMS sequences, send follow-ups, move contacts through pipelines — all automatically. But automations are scripts. They match conditions and fire responses. They can't handle:

Questions they weren't explicitly programmed for
Natural conversation flow that goes off-script
Emotional or frustrated customers who need nuance
Open-ended questions like "what do you recommend?"

A real AI worker uses a large language model to understand what the customer is asking, then compose a contextually appropriate response. It reads the CRM, knows the contact's history, and replies like a trained team member would.

How the AI Worker Connects to GHL

The AI worker connects to any GHL sub-account using a Private Integration Token — no marketplace approval, no waiting, no OAuth setup. You create the token inside the sub-account settings in about 2 minutes.

Once connected, the AI worker:

Polls the GHL inbox for new inbound messages every 60 seconds
Reads the contact's tags, pipeline stage, and recent notes for context
Composes and sends a reply via the GHL conversations API
Auto-updates the CRM: tags, pipeline stage, and notes after every conversation
Escalates frustrated customers to your team via Slack/email webhook

This works across all 7 GHL channels: SMS, WhatsApp, Instagram, Facebook, Live Chat, Email, and Google My Business.

What GHL Channels Does the AI Worker Cover?

The AI worker uses GHL's unified conversations API, which means the AI sees messages from all channels in one inbox. The response goes back through whichever channel the customer used. Here's the channel map:

GHL ChannelCoverage
SMS✅ Full support
WhatsApp✅ Full support
Instagram DM✅ Full support
Facebook Messenger✅ Full support
Live Chat✅ Full support
Email✅ Full support
Google My Business✅ Full support

Setting Up a GHL AI Worker in 10 Minutes

Here's the exact process:

Create your Kyra agency account at kyra.conversionsystem.com/signup/agency (free, no credit card)
Add a client — pick the industry (dental, real estate, auto, etc.) and the AI personality is pre-built
Customize the personality — add the business name, AI name, pricing, FAQs, booking link
Generate the GHL Private Integration Token — in the sub-account: Settings → Integrations → Private Integration Tokens → Create
Paste the token into the platform dashboard — the AI goes live instantly

From that point, every inbound message to that GHL sub-account will be handled by the AI within 60 seconds.

How Much Should You Charge?

Most GHL agencies are charging $500–$2,000/month per AI worker. The pricing depends on your client's industry and volume:

Dental/Med Spa: $750–$1,500/mo (high ticket, high volume, high impact)
Real Estate: $1,000–$2,000/mo (high lead value)
Auto Dealership: $1,000–$1,500/mo (high-volume, high-value leads)
Cannabis Dispensary: $500–$1,000/mo (compliance requirements = premium pricing)
Restaurant: $300–$600/mo (lower AOV but steady volume)

Your platform cost: $99/month for up to 5 clients. That's gross margin of $2,400–$9,900/month on the Starter plan alone.

The Proactive Outreach Feature

One underrated feature: the AI worker watches for new contacts in GHL and proactively reaches out — even without an inbound message. Within ~60 seconds of a new lead being created, the AI sends a personalized greeting via SMS.

This is the equivalent of your best salesperson immediately calling every new lead the moment they come in. For most clients, this alone recovers 20–30% of leads that would have gone cold.

CRM Automation That Happens Automatically

Every AI conversation updates the GHL CRM automatically. After each reply, the AI worker:

Adds a CRM note summarizing the conversation
Tags the contact based on what they asked (e.g., "appointment-interest", "price-question")
Moves them to the appropriate pipeline stage (e.g., "AI Qualified" → "Ready to Book")

This means your clients get a cleaner CRM and better pipeline visibility — without any manual data entry.

Common Questions from GHL Agencies

Does the AI worker replace GHL automations? No — they complement each other. GHL automations handle rule-based sequences (appointment reminders, review requests, etc.). The AI worker handles conversational replies that require understanding.

What if a client already has GHL workflows set up? The AI worker only responds to inbound messages — it doesn't interfere with your outbound automations. They work in parallel.

Can I white-label this? Yes. The AI personality is fully configurable — you name it, set its personality, and it represents the client's business. Nothing in client-facing messages reveals the underlying platform.

Troubleshooting the first 48 hours

Most first-deployment issues fall into one of four buckets. Here's how to triage them:

  
    
      Symptom
      Likely cause
      Fix
    
  
  
    
      AI isn't replying to test messages
      Wrong or expired Private Integration Token
      Regenerate the token in GHL Settings → Integrations
    
    
      AI replies but tone is off
      Personality file too generic
      Tighten the SOUL.md-equivalent personality brief
    
    
      AI books appointments wrong
      Calendar not connected or time zone mismatch
      Confirm GHL calendar sync and agent's configured time zone
    
    
      AI gives wrong pricing
      Knowledge base out of date
      Update the uploaded pricing document; rebuild embeddings
    
  

How to measure and report AI worker ROI to GHL clients

The AI worker runs silently. Clients see fewer missed conversations, a cleaner CRM, and more booked appointments — but they may not connect those outcomes to the AI without a clear report. Monthly performance reporting is the single biggest retention tool after the AI itself.

Four metrics that every client report should include:

  Conversations handled. The total number of inbound messages the AI responded to during the period. This is the baseline volume number. A client who receives 180 conversations per month and sees 178 AI-handled conversations immediately understands the coverage they are getting.
  Median response time. Most AI workers reply within 30 to 90 seconds. Put this number in the report and compare it to industry averages. A dental practice that previously returned calls the next business day will notice a 2-minute response time as a qualitative leap.
  Appointments booked by the AI. This is the most persuasive metric for practices that sell via scheduling. Track how many appointments the AI confirmed without any human intervention. Three bookings at $150 per cleaning is $450 your client would not have captured after hours.
  Escalations triggered. Every time the AI flagged a conversation for human follow-up — because of an urgent keyword, a frustrated tone, or an out-of-scope question — counts as an escalation. A low escalation count means the AI handled the conversation cleanly. A high escalation count might mean the personality file needs tightening or the knowledge base is missing key information.

The GHL conversations API logs timestamps, channel, and contact ID for every interaction. Most AI worker platforms expose these metrics in a reporting dashboard, usually at a URL like /report/[clientId] or as a CSV export. Pull the report at the same time every month so the comparison period is consistent.

One agency tactic that consistently reduces churn: send the performance report before the client invoice. When the client sees 200 conversations handled and 8 appointments booked before they see the $800 charge, the math is obvious. They are not renewing a software subscription — they are renewing a result.

For clients who want deeper CRM analytics, the pipeline stage distribution is a useful secondary metric. If 60 percent of AI-handled conversations end with the contact tagged "appointment-scheduled" versus "inquiry-only," that tells the practice which conversation flows are working and which need refinement. Over six months, that distribution shifts as the knowledge base improves — and the trend line becomes a retention story in itself.

Frequently asked questions

What's the difference between a Private Integration Token and a GHL marketplace app?

A marketplace app requires OAuth approval, a listing, and a review process from GHL. A Private Integration Token is a single credential generated inside the sub-account that grants API access. For agency use cases where you're already inside your client's sub-account, Private Integration Tokens are dramatically faster to deploy (2 minutes per client) and require no app approval.

Will the AI worker step on my existing GHL workflows?

No. The AI only responds to inbound conversation messages. Your outbound workflows (appointment reminders, review requests, nurture sequences) continue running untouched. They run in parallel.

Can multiple AI workers share one GHL account?

Each GHL sub-account maps to exactly one AI worker. If you manage 10 sub-accounts, you deploy 10 AI workers — one per sub-account. Each has its own personality, knowledge base, and escalation rules.

Does the AI handle payments or sensitive data?

The AI does not process payments directly. For payment collection, the AI hands off to a GHL payment link or Stripe Checkout URL. The AI never sees or stores credit card data. For other sensitive data (SSN, medical records), the agent is configured to refuse and escalate.

How does billing work between me (the agency) and the platform?

Agencies pay a single platform subscription (flat monthly, tiered by client count) plus model API costs. Clients pay the agency directly for the AI worker service. The platform has no billing relationship with end clients — that's entirely your agency relationship.

What's the realistic upper limit of this service line?

We know agencies running 30+ clients on a single ops person managing the AI side. Beyond that, hiring a junior specialist to monitor alerts and tune personalities makes sense. The work scales sub-linearly with client count, which is what makes the margin work.

When a GHL AI worker isn't the right fit

  Client is on a GHL plan that doesn't support Private Integration Tokens.
  Client receives fewer than ~20 inbound messages per month. Not enough volume for the ROI to land.
  Client's regulatory environment requires every outbound communication be human-signed.
  Client's conversations are predominantly voice calls. Voice AI is a different product category (see our voice-AI coverage once available).

Everything else is in scope.

Ready to add AI workers to your GHL agency? Create your free account and have your first AI live in 10 minutes. For the broader playbook on positioning AI workers as a service line, read how agencies use AI workers for recurring revenue. For the underlying technology, see our guide on the 6 capabilities a real AI agent has.

External references: GoHighLevel documentation · OpenClaw on GitHub · Anthropic Claude documentation.

---

### White-Label AI Platform for Agencies: The 2026 Deployment Guide

- URL: https://kyra.conversionsystem.com/blog/white-label-ai-platform-agencies
- Published: 2026-02-23
- Category: Agency Growth
- Read time: 14 min

Last updated: April 17, 2026

A white-label AI platform for agencies is a managed deployment layer on top of an open-source agent runtime (like OpenClaw) that lets agencies deploy isolated AI workers for each of their clients — under the agency's brand, with per-client data isolation — without building infrastructure. This guide covers when the white-label model works, how to structure pricing, and the 6-month path from first client to a stable 50-client book.

  Key takeaways
  
    Agencies sell the AI worker under their own brand. Clients never see the underlying platform.
    Each client gets an isolated AI container — separate personality, knowledge base, memory, and data.
    Typical margin: 85 to 95 percent after platform and API costs. Retention is near-zero churn once the AI is delivering.
    First-client onboarding takes 30 to 60 minutes. Subsequent clients: under 15 minutes per deployment.
    Strongest verticals: dental, real estate, auto, med spas, cannabis, high-volume local service.
  

In 2026, the most profitable agencies aren't selling websites, ads, or even GHL setups. They're selling AI workers. And the ones who figured this out first are building significant recurring revenue on autopilot — because the AI worker retainer has economics no other agency service line can match.

This guide is the complete playbook for building a white-label AI worker business on an OpenClaw-based platform — built specifically for agencies who want to resell AI without building anything from scratch.

Why AI Workers Are the Perfect Agency Product

Most agency revenue is project-based or tied to ad spend — both are unpredictable and client-churn-heavy. AI workers are different:

Monthly recurring revenue: The AI runs 24/7 whether or not you do anything
Near-zero churn: Clients don't cancel an AI that's booking their appointments
High gross margins: Your cost to provide the service is $5–15/client/month in API fees
Scalable: Going from 5 to 50 clients doesn't require hiring more staff
Defensible: The AI learns the client's business over time — switching costs increase

The White-Label Model

With a white-label AI platform, you are the agency. Your clients never see the underlying software — they see an AI worker named whatever you have configured (Alex, Maya, Jordan — your choice). The AI is trained on their specific business, speaks their tone, and represents their brand.

Your clients think you built this. You did not have to — the platform is the infrastructure, you are the relationship and the strategy.

Pricing Strategy

Positioning matters as much as price. Don't sell this as "AI" — sell it as an AI worker. Here's how to frame it:

"We're adding a full-time AI worker to your business. It responds to every customer inquiry in under 60 seconds, 24/7. It books appointments, answers questions, updates your CRM, and escalates anything it can't handle. Most businesses see ROI in the first week."

Suggested pricing by tier:

PackagePriceIncludes
AI Starter$500/mo1 channel (SMS), basic personality, standard templates
AI Pro$1,000/moAll 7 channels, custom personality, CRM automation, escalation alerts
AI Enterprise$2,000/moEverything in Pro + weekly performance reports, monthly strategy calls, priority support

Your platform cost: $299/month for up to 10 clients (Pro plan). At $1,000/client on 10 clients = $10,000/month revenue, $9,701/month gross margin.

Client Onboarding Playbook

The onboarding flow is where most agencies fumble. Here's the process that works:

Day 1: Kickoff Call (30 min)

Walk them through what the AI will do
Collect: business name, AI name, pricing, FAQs, common objections, booking link
Get their GHL Private Integration Token (show them exactly how to create it)

Day 2: Configuration + Live Test

Set up the client's AI in the platform — takes ~15 minutes with the industry template
Test it yourself: send 10 different test SMS messages
Send a test to the client so they can see it live

Day 3: Go Live

Flip the switch — the AI starts responding to real customer messages
Monitor for the first 48 hours; expect a few edge cases to tune

Week 1: First Performance Report

Share the performance report at /report/[clientId] — conversations handled, response time, resolution rate
This is your proof of value — use it in retention conversations

Industries That Sell Best

Not all industries are equal. Here's where you'll have the easiest sales:

Dental practices — High urgency, appointment-driven, staff overwhelmed, new patient value $3K+
Real estate agents — Every missed lead is a $10K+ lost commission
Auto dealerships — High-volume, high-value, 24/7 customer inquiries
Med spas — High-ticket treatments ($500-5K), strong urgency, lots of questions
Cannabis dispensaries — Always-on business with compliance needs

How to Scale to $50K/Month

At $1,000/month average per client, you need 50 clients. Here's the path:

Month 1–2: Land 5 clients from your existing network. Get them results. Get testimonials.
Month 3–4: Use testimonials + the pitch deck at /pitch to close 5 more. Start outreach to cold prospects using the email templates.
Month 5–6: Referral machine — happy clients refer other businesses. Offer 1 free month per referral.
Month 7–12: Systemize. Hire a junior VA for onboarding. You focus on sales. Target: 50 clients.

The compounding advantage: every client you add at month 6 is still paying at month 18. The churn is nearly zero because the AI is delivering daily, measurable value.

The Pitch That Closes

Stop pitching AI. Pitch the outcome:

"Your business is missing 40% of inquiries after 6pm. That's revenue walking to your competitor. We'll put an AI worker on your phone line tonight. By Thursday morning, it will have handled 20+ conversations you would have missed. You'll see the report."

Then show them the live demo: kyra.conversionsystem.com/try/dental. Let them text it. Let them see a real AI reply in 10 seconds. Close rate goes up 3×.

Get Started

The platform is free to start — no credit card, no commitment. Add your first client, run the demo, see the AI live. If it does not work, you have not spent a dollar.

How white-label AI compares to other agency service lines

  
    
      Service line
      Onboarding time per client
      Monthly ops time per client
      Typical margin
      Churn profile
    
  
  
    
      Website build
      20 to 60 hours
      1 to 3 hours
      40 to 60%
      Often one-time
    
    
      Facebook ads
      6 to 10 hours
      4 to 10 hours
      30 to 50%
      4 to 8 months typical
    
    
      SEO retainer
      8 to 20 hours
      6 to 15 hours
      30 to 60%
      6 to 12 months
    
    
      White-label AI worker
      15 to 30 minutes
      10 to 30 minutes
      85 to 95%
      Near-zero once live
    
  

The combination — low onboarding, low ops, high margin, near-zero churn — is unusual. It's why the agencies investing in this category now are building defensible positions before the space gets crowded.

Data isolation is the feature sophisticated clients ask about

For regulated clients (dental, legal, medical, financial), "your data won't be mixed with anyone else's" is not a nice-to-have. It's the table-stakes question on every vendor evaluation call.

The white-label deployment model addresses this directly: each client gets an isolated container with their own storage, their own AI personality, their own knowledge base, and their own memory. Nothing from Client A ever touches Client B. If regulated clients are part of your book, this is the line that sells the service.

How to handle the five most common sales objections

Most white-label AI worker sales stall on five objections. Here is how agencies with live client books actually answer them.

"We already have a chatbot." Most site chatbots are keyword-matching scripts. They fire a template when they detect a phrase. They cannot handle off-script questions, cannot book appointments, and cannot update the CRM after a conversation. Ask the prospect: does their current chatbot know the contact's pipeline stage before it replies? Does it update tags after every conversation? If not, it is not a chatbot replacement they are evaluating — it is a step change.

"What if the AI says something wrong?" This is the most common objection and the easiest to defuse. Walk through the escalation layer: the AI is configured to refuse clinical questions, legal questions, and anything requiring a human judgment call. It hands off to staff for anything outside its brief. Then show them a conversation log from a live deployment. Seeing the AI correctly escalate a frustrated customer is worth more than any verbal explanation.

"Is our data safe?" Answer this by explaining per-client container isolation. Their data is not mixed with any other business. They are not on shared infrastructure. If they are in a regulated vertical, add that the AI does not access or store protected health information — it handles the same intake and scheduling communications a receptionist would via text. For the HIPAA question specifically, point them to the HHS guidance on incidental disclosures during scheduling, which covers typical AI worker workflows.

"We don't have the budget right now." Walk through the unit economics. A dental practice where one missed patient is worth $2,000 to $5,000 in lifetime value pays for a full year of AI worker service with a single recovered booking. Ask how many texts went unanswered last month after 5pm. Most practice managers can recall two or three just from last week. The payback period is usually one to two weeks, not months.

"We need to talk to IT / our compliance team first." This is a buying signal, not a stall. Follow up with a one-page technical brief covering: what API access the Private Integration Token grants, where data is stored, how conversations are logged, and what the escalation rules are. Regulated clients who ask this question are close to signing — they just need to document due diligence. Agencies that have a technical brief ready close at 3x the rate of those who do not.

The pattern across all five: objections about AI are usually objections about risk. The answer is always specifics. Vague reassurances do not close deals. Concrete escalation rules, actual conversation logs, unit economics with real numbers, and per-client isolation diagrams do.

Frequently asked questions

What does "white-label" actually include?

Clients see your agency's brand on the dashboard (if exposed), on the AI worker's name and voice, and on any public-facing surfaces (widgets, embed codes). They never see the underlying platform brand. If a client discovers the underlying technology, it's because you chose to disclose it.

Do I need my own infrastructure?

No. The platform handles hosting, scaling, updates, and failover. Your work is configuration and client management. If you want maximum control (regulated client, custom hosting requirement), you can optionally self-host the agent runtime on your own servers — but that's a later-stage choice, not a requirement to start.

How do I position this in a sales conversation?

Lead with the outcome, not the technology. "Your business is missing 30 to 40 percent of inquiries after 6pm. We'll put an AI worker on your phone line tonight. By Thursday morning, it will have handled 20-plus conversations you would have missed. You'll see the report." Then show the live demo. Close rate goes up 3x compared to pitching "AI" abstractly.

What happens when the AI gets something wrong?

Three layers of safety: explicit escalation rules (urgent keywords, frustrated tone), soft fallbacks ("I'm not sure, let me connect you with a team member"), and full audit trails. Every conversation is logged. If something goes wrong, you review the transcript, tighten the personality file, and ship the fix in under 10 minutes.

Can I run this for clients who aren't on GoHighLevel?

Yes. GHL is the most common integration path, but the underlying runtime supports direct SMS, email, Slack, Discord, Matrix, Microsoft Teams, iMessage, and WhatsApp via their respective APIs. For clients on platforms like HubSpot, Pipedrive, or custom CRMs, webhook integrations cover most workflows.

How do I price this for my agency's size?

If you're new: start at $500 per client per month. Use the first three clients to build case studies. If you're established: price based on client value. A dental practice where one lost patient is $2,000 easily supports $1,500 per month. Enterprise med spas and real estate teams can support $2,000-plus. Never underprice on retention economics this strong.

When white-label AI isn't the right model for you

  You have no existing client base and aren't set up to acquire new ones.
  You want zero ongoing management — even 10 minutes per client per month is too much.
  You're not comfortable troubleshooting AI personality issues when they come up (usually easy, but it's a skill).
  You plan to market the AI worker as your own proprietary technology (possible, but the marketing story is different from "we deploy AI").

For every other agency, the economics make this the single most interesting service line to add in 2026.

Create your free agency account →

Related reading: The 6 capabilities an AI agent has that a chatbot doesn't · The GHL AI worker complete guide · What OpenClaw is.

External references: OpenClaw on GitHub · OpenClaw documentation · Anthropic Claude documentation · Model Context Protocol specification.

---

### How to Add an AI Worker to Every GHL Client (2026 Agency Guide)

- URL: https://kyra.conversionsystem.com/blog/ghl-ai-employee-agency
- Published: 2026-02-22
- Category: Agency Growth
- Read time: 10 min

Last updated: April 17, 2026

An AI worker for GHL is an autonomous AI agent that connects to a GoHighLevel sub-account via a Private Integration Token and handles inbound conversations across SMS, WhatsApp, Instagram, Facebook, Live Chat, email, and Google My Business — 24/7, in under 60 seconds per reply, without staff involvement. This guide explains what an AI worker actually does, how it plugs into GHL, how agencies typically structure the offering, and how to go live on your first client in under 15 minutes.

  Key takeaways
  
    An AI worker is a full agent (not a chatbot) that reads conversations, executes tool calls, updates the CRM, and books appointments.
    It connects to any GHL sub-account via a Private Integration Token — no marketplace, no OAuth app, no waiting.
    Typical agency retainer pricing: $500 to $2,000/month per client depending on vertical and volume.
    Setup takes under 15 minutes per client once your agency is configured.
    All seven GHL conversation channels are covered: SMS, WhatsApp, Instagram, Facebook Messenger, Live Chat, email, Google My Business.
  

Most GHL agencies have the same problem. You charge a setup fee, maybe a retainer, and then the relationship goes quiet. The client stops replying. Revenue stagnates.

Adding an AI worker to every client solves two problems at once: it gives your client measurable, ongoing value (so they stop churning), and it creates a new recurring revenue line for your agency that scales without hiring.

What is an AI worker?

An AI worker is an autonomous AI agent that:

  Responds to every inbound SMS in under 60 seconds — 24/7
  Books appointments by checking availability and confirming times
  Updates GHL CRM with tags and notes after every conversation
  Detects frustrated customers and escalates to your human team instantly
  Handles opt-outs, business hours, and multi-channel messages automatically

It's not a chatbot. It's not a canned response bot. It's a real AI that understands context, remembers conversation history, and operates like a trained employee — except it never sleeps, never takes a vacation, and never calls in sick.

Why GHL Agencies Are Perfectly Positioned

You already have the infrastructure. GHL gives you the CRM, the pipelines, the phone numbers, and the messaging channels. An AI worker plugs directly into your existing GHL setup using a Private Integration Token — no OAuth approval, no marketplace hurdles.

The moment your client adds their GHL token, the AI starts monitoring their conversations and responding automatically. Setup takes about 10 minutes. The AI goes live immediately.

The Pricing Model That Works

Most agencies bill their AI workers as a retainer:

  Local service businesses (dental, HVAC, fitness): $500–$1,000/month
  High-ticket service businesses (real estate, law, med spa): $1,000–$3,000/month
  Volume businesses (cannabis, auto, restaurant): $500–$2,000/month

The pitch is simple: "Your AI worker responds to every lead in 60 seconds, books appointments, and updates your CRM — while you sleep. If it books one extra appointment per week, it's paid for itself."

For a dental practice at $150/cleaning, three extra bookings per week = $450/week = $1,800/month. You charge $800. They're ahead.

How to Get Your First Client Live

  Sign up for Kyra — free account at kyra.conversionsystem.com
  Add the client — pick an industry template (dental, real estate, auto, etc.)
  Connect GHL — add the client's Private Integration Token (from their GHL Settings → Integrations)
  Customize the personality — or click "Generate with AI" to auto-write the AI's persona in seconds
  Go live — the AI starts responding within 60 seconds

The whole process takes under 10 minutes. The AI does the rest.

Scaling to 10+ Clients

Once you have your first AI worker live and your client sees results, scaling is straightforward. Every new client follows the same 5-step process. The AI personalities are different, the channels might vary, but the infrastructure is the same.

At 10 clients charging $800/month each: $8,000/month in recurring revenue. At 20 clients: $16,000/month. The pro plan handles 10 clients at $299/month — your gross margin is 96% before API costs.

The Competitive Moat

Once your client's AI worker is live and working, they won't want to turn it off. The AI builds up conversation history, learns the business's tone, and becomes genuinely useful over time. Churn on a working AI worker is near zero.

Every booking it makes, every lead it handles, every CRM update it logs — that's value your client can see. It's not abstract "automation." It's results they can count.

Why AI workers dramatically reduce agency client churn

The average agency loses 30 to 50 percent of its client base annually. Clients churn when they stop seeing measurable results — which happens with ads when ROAS drops, with SEO when rankings plateau, and with website work when the project ends.

AI workers behave differently. Three structural factors keep churn near zero once the AI is live:

  Daily visible value. The AI runs every day. Clients see conversation logs, response times, and booked appointments without waiting 90 days for a ranking report or an ad performance review.
  Compounding institutional knowledge. The AI learns the business's vocabulary, pricing, objections, and typical conversation flows over months. Switching to a new system means retraining from scratch — a switching cost that compounds every month the AI runs.
  CRM dependency. Once the AI is writing contact notes, updating pipeline stages, and tagging leads by conversation content, the client's GHL data depends on the AI staying active. Turning it off breaks the CRM automation they have built their operations around.

Compare that to an ad campaign or a monthly SEO report. An AI worker that books three appointments a week is not abstract value. It is a line item the client can point to on a Monday morning. That is why agencies running AI workers consistently see 12-month retention rates well above 80 percent — versus 30 to 50 percent on most other service lines.

For the broader business model behind this service line, see our agency recurring revenue guide.

How AI workers compare to GHL workflow automations

GHL's native workflow automations are a powerful tool, but they're a different kind of tool. Here's how to think about them:

  
    
      Capability
      GHL Workflow Automation
      AI Worker
    
  
  
    
      Reply to a generic inquiry
      Matches keywords, sends templated reply
      Reads context, writes an original reply
    
    
      Handle off-script questions
      Falls back to default or staff
      Composes a contextual answer
    
    
      Book an appointment
      Via calendar-link trigger
      Checks availability, offers times, confirms
    
    
      Update pipeline stage
      Rule-based on trigger events
      Decides based on conversation content
    
    
      Escalate to a human
      Only if explicitly configured
      Detects frustration or complexity automatically
    
  

They work together. Workflows handle deterministic paths (opt-outs, appointment reminders, drip sequences). AI workers handle the open-ended conversations workflows can't model.

What to review in the first 30 days of an AI worker deployment

Most agencies go live with a new client and then check back in at the 30-day mark. That window is where the AI builds trust with the business — and where you catch the issues that cause churn before they become a problem.

Four things to look at in the first 30 days:

  Response rate. The AI should be replying to 95 percent or more of inbound messages. If you see gaps, check whether the GHL Private Integration Token has the right scopes, or whether any conversation channels are excluded from the polling config.
  Escalation frequency. A healthy deployment escalates 5 to 15 percent of conversations. Escalation below 5 percent may mean the escalation rules are too narrow — the AI is answering things it should hand off. Escalation above 20 percent usually means the personality file is too conservative or the knowledge base is missing common answers.
  CRM tag accuracy. Pull a sample of 20 GHL contacts and review the tags the AI wrote. Tags like "appointment-interest" or "price-question" should reflect what actually happened in the conversation. If the tags are consistently wrong, the AI is not reading the conversation outcome correctly — update the personality file's tagging instructions.
  Client-reported edge cases. Ask the client to flag any conversation that surprised them — positive or negative. One or two odd replies in 200 conversations is normal. Patterns of odd replies in the same scenario point to a gap in the knowledge base.

The 30-day review is also the moment to lock in the retainer renewal. Bring a summary of conversations handled, appointments booked, and escalations managed. Most clients who see 3 to 8 booked appointments they would have missed renew without a negotiation. The report does the selling.

A practical tip: ask the client to text the AI themselves once a week, as if they were a customer. Owners who see their AI respond accurately and on-brand within 30 seconds become its biggest advocates internally. That internal advocacy matters when you want to expand the service or raise pricing at month six.

Frequently asked questions

Do I need my client to be on a specific GHL plan?

Any GHL sub-account that supports Private Integration Tokens works. That's the standard SaaS and Pro plans at the time of writing. Check the official GoHighLevel documentation for current plan features if you need to confirm.

How long does it take to onboard a new client?

Under 15 minutes per client once your agency account is set up. The steps: create the client in your dashboard, pick an industry template, paste the GHL Private Integration Token, customize the personality file, and go live. The AI starts responding to new inbound messages within 60 seconds of activation.

What happens if the AI doesn't know an answer?

The agent is configured to escalate rather than hallucinate. If a customer asks something outside the knowledge base, the AI either asks a clarifying question or tags the conversation for human follow-up. Escalation rules are configurable per client — some agencies set hard rules (medical questions, legal questions, refund requests) that always route to a human.

Which GHL channels does this cover?

All seven conversation channels GHL supports: SMS, WhatsApp, Instagram DM, Facebook Messenger, Live Chat, email, and Google My Business. The AI sees everything in the unified GHL conversations inbox and responds through whichever channel the customer used.

Can I white-label this for my agency?

Yes. The client never sees the underlying platform. The AI has whatever name you configure (Alex, Maya, Jordan). The dashboard is your branded portal. The conversations appear to come from the client's business. See our white-label deployment guide for the full setup.

What if my client wants to take it over themselves someday?

That's a business decision you control. The AI workers live in your agency account. You can transfer ownership of a client container, migrate it out, or keep it locked to your agency as part of the retainer. Most agencies keep it locked — that's the moat.

When an AI worker isn't right for a client

Not every GHL client is a fit. Skip the offer if:

  They receive fewer than 5 inbound messages per week. The math doesn't work for them.
  Their business is highly regulated in ways that require every reply to be human-reviewed before sending.
  They have an in-house team that handles inbound within minutes and is at excess capacity.
  They operate in a language the AI doesn't handle well. (Most major languages work; niche regional languages may not.)

For every other client, the math works.

Ready to add your first AI worker? Try a live demo or start your free agency account. For the underlying technology, see our guide on what OpenClaw is.

External references: GoHighLevel help center · OpenClaw on GitHub (the agent runtime powering AI workers) · Anthropic Claude documentation.

---

### AI for Dental Practices: The 2026 Guide to 24/7 Patient Response

- URL: https://kyra.conversionsystem.com/blog/ai-for-dental-practices
- Published: 2026-02-22
- Category: Industry Guide
- Read time: 11 min

Last updated: April 17, 2026

An AI worker for a dental practice is an autonomous AI agent that responds to inbound patient texts, calls, and chats in under 60 seconds — day or night — to answer common questions, verify insurance, book appointments, and route urgent situations to the on-call staff. This guide explains exactly what a dental AI handles, what it cannot and should not handle, how it integrates with GoHighLevel, and what practices typically see in the first 30 days.

  Key takeaways
  
    Dental practices lose an estimated 30 to 40 percent of new patient inquiries to after-hours unavailability. An AI worker captures most of those.
    Deployment is roughly 15 minutes when using an industry template for dental.
    The AI handles scheduling, FAQs, and insurance inquiries. It does not access or store medical records.
    Integration with GoHighLevel is native: SMS, WhatsApp, email, and chat all flow through one inbox.
    Typical first-month outcomes: 3 to 8 new patient bookings recovered from after-hours inquiries, 100 percent response rate on new-patient texts.
  

Dental practices have a lead problem that most dentists don't realize: their phones go unanswered.

A patient texts at 7pm asking about a cleaning. The office is closed. By morning, they've booked with another practice. That's $150 lost on a cleaning — potentially $2,000+ if they become a regular patient.

An AI worker fixes this problem permanently.

What the Dental AI Worker Does

When a patient texts your practice's GHL number, the AI responds within 60 seconds — day or night. It:

  Answers questions about pricing, services, and insurance
  Books appointments by offering available time slots
  Handles insurance questions ("Do you take Delta Dental?" — yes, with most major plans)
  Sends appointment confirmations and preparation instructions
  Follows up with no-shows
  Tags the contact in GHL (new patient, appointment requested, insurance verified, etc.)

It does all of this automatically, without staff involvement, 24 hours a day.

A Real Example

Here's an actual conversation a dental AI worker handles:

Patient: Hi, I need a cleaning. How much does it cost?
Kyra AI: Hi! 😊 A routine cleaning is $150, or fully covered with most insurance. We're accepting new patients — want to book?
Patient: Yes! Do you have anything this week?
Kyra AI: Tuesday at 2pm or Thursday at 10am. Which works? And which insurance do you have?
Patient: Tuesday works. I have Delta Dental.
Kyra AI: ✅ Reserved! Tuesday 2pm, Smile Dental. With Delta Dental you're fully covered. Confirmation text coming tomorrow morning!

This conversation happened at 9:30pm. Without the AI, that patient sends a text to a closed practice and books elsewhere in the morning. With the AI, they're booked before they go to sleep.

What About HIPAA?

The AI doesn't access or store patient medical records. It handles scheduling and basic inquiries — the same type of communication a front desk receptionist would have via text. However, your practice should review AI communications with your compliance officer to ensure alignment with your specific policies.

Integration With GoHighLevel

If your practice uses GoHighLevel (or you're a marketing agency that manages their GHL account), the AI worker integrates directly. It:

  Reads and responds to GHL SMS conversations
  Tags contacts automatically (new-patient, appointment-scheduled, insurance-verified)
  Moves contacts through your pipeline stages
  Writes notes after every conversation
  Escalates urgent situations immediately

What to Expect in the First Week

Most dental practices see results within 48 hours of going live. The AI starts catching inquiries that would have otherwise gone unanswered. Typical outcomes in the first 30 days:

  3–8 new patient appointments booked from after-hours inquiries
  Significant reduction in "quick question" calls during business hours
  100% response rate on new patient inquiries

What the dental AI does NOT do

Being clear about boundaries builds trust with practices that are rightly cautious. The AI worker does not:

  Access the practice management system or patient medical records
  Offer clinical advice ("Is this toothache an emergency?" gets an escalation, not an opinion)
  Prescribe medications or interpret symptoms
  Handle billing disputes or insurance appeals
  Replace the emergency triage that a human dental team performs

Everything on that list stays with staff or the on-call dentist. The AI covers the scheduling and intake layer that eats hours of front-desk time every week.

A typical 2026 dental deployment

Here is what a deployment looks like for a mid-size practice (two dentists, ~400 active patients):

  
    
      Item
      Detail
    
  
  
    
      Channels active
      SMS + website chat + Google Business Profile messaging
    
    
      Response SLA
      Under 60 seconds, 24/7
    
    
      Setup time
      15 minutes using the dental industry template
    
    
      Escalation triggers
      Emergency keywords, frustrated tone, insurance-appeal requests
    
    
      CRM used
      GoHighLevel (via Private Integration Token)
    
    
      Monthly conversations handled
      150 to 400 depending on advertising volume
    
  

What to look for in a dental AI worker

Not all AI workers handle dental workflows equally. Before committing to a deployment, evaluate any option against these five criteria:

  Native CRM integration — not a Zapier wrapper. An AI worker that routes through a third-party automation layer introduces additional latency, failure points, and per-task costs. Direct API access to GoHighLevel is the standard for serious deployments. Check whether the integration is native or intermediated.
  Configurable escalation rules. Dental practices deal with pain, anxiety, and occasionally urgent clinical situations. The AI should hand off to staff the moment it detects emergency keywords ("tooth knocked out," "severe pain," "can't stop bleeding"), frustrated tone, or clinical questions. Hardcoded escalation lists are a red flag — you want rules you can tune per practice.
  Per-practice data isolation. Patient inquiry data should live in an isolated container for that practice, not mixed with thousands of other businesses on shared infrastructure. For HIPAA-adjacent workflows, isolation is a minimum baseline, not a premium feature.
  Personality customization at the field level. A pediatric practice needs different tone, vocabulary, and response patterns than an oral surgery group. Look for systems that let you configure tone, forbidden topics, booking logic, and escalation triggers independently per client — not just a single global setting.
  Full audit trails. Every AI reply should be logged with a timestamp, the message received, the action taken, and any CRM updates made. Dental practices are not required to archive patient texts the way medical records are archived, but a complete audit trail protects the practice if a patient dispute arises over what was communicated.

A properly configured dental AI worker passes all five. A generic chatbot repurposed for dental typically fails on escalation rules and data isolation first — both of which matter most in regulated environments. The OpenClaw-based architecture described in our gateway guide addresses each of these points by design.

How to introduce an AI worker to your dental front desk team

Front desk staff who hear "AI is going to handle our patient texts" often interpret it as a threat to their jobs. That interpretation almost always kills the deployment before it starts. Getting the front desk team on board — genuinely on board, not just compliant — is the single most important non-technical factor in a successful dental AI rollout.

Three things that consistently work:

Frame it as coverage, not replacement. The AI handles the 9pm text, the Saturday morning inquiry, the "quick question" call during a hygiene appointment that the front desk cannot pick up. It does not replace the judgment call the receptionist makes when a patient walks in upset, the human warmth of a new-patient phone call, or the complex insurance negotiation that takes 20 minutes. Be specific about what the AI covers and what it hands off. The more precise the boundaries, the less threatened staff feel.

Show them the escalation path first. Before going live, walk the team through what happens when the AI flags a conversation. The notification goes to a phone or Slack. The staff member opens the conversation, reads what the AI said, and picks up from there. This is not a black box overriding their judgment — it is a first responder that gets the conversation started and then hands off. Most front desk teams become enthusiastic about this workflow once they see it removes the most draining part of their job: being on call for low-complexity inquiries at all hours.

Let them test it themselves. Give each team member the practice's number and ask them to text it as if they were a new patient. Watch the AI reply. Let them ask a question the AI might struggle with. Most staff go from skeptical to impressed in about five minutes when they see a natural conversation handled correctly. Practices that run this demo internally before going live with real patients report significantly smoother rollouts.

One practical note on team communication: tell the staff before you go live, not after the first patient mentions the AI in a call. Practices where staff found out from a patient interaction — "I texted you last night and got an instant reply!" — had a much harder time with internal adoption than practices where leadership introduced it proactively as a coverage tool.

The front desk team's trust in the AI directly affects its effectiveness. When staff actively monitor escalations and tune the knowledge base when the AI misses something, the system improves rapidly. When they ignore escalations or resent the deployment, gaps compound. Internal buy-in is not a nice-to-have in dental AI deployments — it is a deployment requirement.

Frequently asked questions

Is this HIPAA compliant?

The AI worker itself does not access or store protected health information (PHI). It handles the same kinds of interactions a front-desk receptionist handles via text: scheduling, directions, insurance plan inquiries, pricing. However, HIPAA compliance is a property of the whole workflow, not the AI in isolation. Practices should review the AI's operating scope with their compliance officer and ensure their consent forms cover SMS communication. The U.S. Department of Health and Human Services maintains guidance at hhs.gov/hipaa.

Will patients know they're talking to an AI?

That's a choice each practice makes. Many practices disclose it explicitly in the first message ("Hi, I'm Alex, the practice's virtual assistant. I can help with scheduling, insurance, and directions."). Transparency tends to build more trust than trying to hide it, and it sets clean expectations about what the assistant can handle.

What happens if the AI can't answer something?

It escalates. The conversation gets tagged in GHL for staff to follow up, and urgent keywords trigger an immediate notification to a designated team member's phone or Slack. The AI never pretends to know a medical answer it doesn't have.

How much does this cost the practice?

If you are an agency deploying this for dental clients, typical retainers are $500 to $1,000 per month. The practice compares that to the cost of one lost patient (often $2,000 to $5,000 in lifetime value) and the math is immediate. If you are a practice shopping directly, most agencies will quote a 60-day pilot.

Can it integrate with dental-specific practice management software?

Native integrations with Dentrix, Eaglesoft, Open Dental, and similar are limited today. Most practices route the AI through GoHighLevel for conversation handling, then have staff transfer booked appointments into the practice management system manually. This is a multi-minute task per booking, not hours, and it stays inside HIPAA-compliant workflows.

Does it work for specialist practices (orthodontics, endodontics, oral surgery)?

Yes, with personality and knowledge-base customization. The standard dental template is tuned for general practice. Specialist templates are available for orthodontics (braces and Invisalign intake), endodontics (root canal inquiries), and oral surgery (extraction scheduling). Customization happens in the agent's personality file.

When a dental AI worker isn't the right fit

A practice should skip the AI worker if:

  The practice has a full-time receptionist with excess capacity and no missed calls or after-hours voicemails.
  Patient volume is under 5 new inquiries per week total.
  State regulations require every patient communication to be reviewed by a licensed clinician before sending (rare but exists).
  The practice doesn't want to use SMS for patient communication at all.

Most general-practice dental offices fit the target profile cleanly.

Ready to see it in action? Try the live dental AI demo — type anything a patient would say. For the broader agency-deployment story, see our GHL AI worker agency guide or our primer on what an AI agent can actually do.

External references: HIPAA guidance from HHS · GoHighLevel documentation · OpenClaw documentation.

---

### How Agencies Use AI Workers to Build Recurring Revenue (2026 Playbook)

- URL: https://kyra.conversionsystem.com/blog/agency-recurring-revenue-ai
- Published: 2026-02-22
- Category: Agency Growth
- Read time: 12 min

Last updated: April 17, 2026

An AI worker retainer is a recurring monthly service line agencies add on top of existing client work, where the agency deploys and manages an autonomous AI agent that handles inbound messages, books appointments, and updates the CRM for each client — typically priced between $500 and $2,000 per month per client. This playbook walks through how to structure the offering, how to land your first three clients, and how to scale the book without hiring.

  Key takeaways
  
    AI workers break the linear-revenue trap most agencies hit — marginal cost of adding a 15th worker is near zero.
    Pricing runs $500 to $2,000 per client per month. Typical gross margin after platform and API costs is 85 to 95 percent.
    The 90-day playbook: 14 days foundation, 16 days first paid client, 60 days to three paying clients and a repeatable process.
    Best starter verticals: dental, real estate, auto dealerships, med spas, cannabis dispensaries. High lead value, high missed-call rates, steady volume.
    Retention is the moat. Churn on a working AI worker is near zero because the agent compounds value month over month.
  

Most digital marketing agencies have a ceiling problem. You can only take on so many clients. Every new client means more work, more management, more headaches. Revenue grows linearly. Costs grow almost as fast.

AI workers break this model. Here's why: the marginal cost of adding a 15th AI worker is almost zero. You configure a personality, connect GHL, and the AI does the rest. The infrastructure scales automatically. You don't hire more staff. You don't increase overhead.

That's the opportunity.

The Math

Let's run the numbers on a basic agency AI operation:

  Platform cost: $299/month (pro plan, up to 10 AI workers)
  API cost: ~$1–3/client/month at moderate conversation volume
  Your price to clients: $800–$1,500/month per AI worker
  10 clients at $1,000/month: $10,000 MRR
  Your costs: ~$290/month (platform + API)
  Gross margin: ~97%

At 10 clients you're at $10,000/month with essentially no marginal cost increase.

The 90-Day Playbook

Days 1–14: Foundation

  Sign up for Kyra at kyra.conversionsystem.com
  Set up a demo AI worker in your own agency's name (dental or real estate work great)
  Get comfortable with the dashboard: adding clients, customizing personalities, viewing conversations
  Use the Pitch Generator to create shareable demo links for 3 industries you know well

Days 15–30: First Client

  Pick your easiest existing client — probably someone you talk to regularly who trusts you
  Show them a live demo using their industry's pitch page
  Offer a 30-day trial at $0 (or a reduced rate) to prove the value
  Get them live on your platform, connect their GHL, customize their personality
  Let the AI run for 2 weeks and review the results together

Days 31–60: Three Paying Clients

With one live success story, the sell becomes much easier. Now you have a real example: "My client John at ABC Dental had 5 appointments booked by the AI in the first week." That's all you need.

Use the pitch pages and the live demo to show prospects. The demo does the heavy lifting — most people are convinced after 3 minutes of watching the AI respond.

Days 61–90: Scale to $10K MRR

By day 60, you should have 3–5 paying clients. The system is running itself. Now you systematize:

  Use the Business in a Box templates for cold emails and LinkedIn outreach
  Set up the referral program — give existing clients a free month for every referral that converts
  Expand within existing clients: if you have a dental practice, ask if they have partner practices

Who to Target First

The best first prospects are businesses that:

  Receive high volumes of repetitive text inquiries (pricing, hours, availability)
  Have staff time wasted on simple Q&A
  Miss leads after hours
  Are already in GHL (or you can get them there)

Best industries for fast results: dental, real estate, auto dealerships, cannabis dispensaries, restaurants, med spas, and fitness studios.

The Retention Play

Here's the best part: AI worker churn is nearly zero. Once it's live and working, clients don't want to turn it off. The AI builds up institutional knowledge — it knows the business's tone, the common questions, the pipeline stages. Replacing it means starting over.

Compare this to typical agency services where clients churn after 3–6 months. An AI worker that books appointments and handles leads creates ongoing, measurable value that compounds over time.

How to build a sustainable referral system

The 90-day playbook above gets you to three to five paying clients. Scaling beyond that typically comes from referrals, not cold outreach.

A simple referral program that works: offer existing clients one free month for every client they introduce who signs a contract. The economics work because your marginal cost of adding a client is near zero — giving away an $800 month costs you roughly $15 in platform and API fees, not $800.

Three practical steps:

  Ask at the 60-day mark. Once a client has seen two months of performance reports, their skepticism is gone. That is the right moment to ask: "Do you know anyone else who would want this?"
  Give them the demo link, not the pitch. Most business owners know other business owners. A dental client knows other dental professionals. Sending them a demo link lets the AI sell itself. You get a warm introduction; the AI does the convincing.
  Keep the referral program simple. One rule: introduce a client who signs, get one free month. No tiers, no points, no complexity. Simple referral programs generate more referrals than tiered ones because the math is obvious to the referring client.

By month six, a referral system running alongside direct outreach should account for 30 to 50 percent of new client acquisitions. That is when the business starts compounding on its own momentum.

A real-world economic comparison

Here's how the unit economics actually look across a typical 10-client book compared to traditional agency services:

  
    
      Service line
      Typical monthly price per client
      Gross margin
      Typical 12-month retention
    
  
  
    
      Facebook ads management
      $800 to $3,000
      30 to 50 percent
      4 to 8 months
    
    
      Website build + maintenance
      $200 to $800
      40 to 60 percent
      Variable (often one-time)
    
    
      SEO retainer
      $1,000 to $5,000
      30 to 60 percent
      6 to 12 months
    
    
      AI worker retainer
      $500 to $2,000
      85 to 95 percent
      12+ months (near-zero churn)
    
  

The AI worker retainer is the first new agency service line in a decade with margin and retention characteristics this strong. That's why agencies that move early establish a defensible position.

How to price AI workers by vertical

Flat pricing across all clients is the most common mistake agencies make in year one. The right price depends on the client's average customer value and how many conversations the AI handles per month. A practice where one booking is worth $150 and a dental practice where one booking is worth $2,000 in lifetime value should not pay the same retainer.

Here is how to structure pricing by vertical:

Dental and medical spa ($750 to $1,500 per month). High ticket, appointment-driven, and extremely time-sensitive — a patient who texts at 9pm and gets no reply books with the competitor by morning. The AI worker's value is immediate and measurable. Use the 30-day booking report as your pricing anchor: if the AI books 4 additional cleanings at $150 each in month one, that is $600 in recovered revenue on a $750 retainer. Most dental clients see 6 to 10 recovered bookings in the first 30 days, which makes the math obvious.

Real estate ($1,000 to $2,000 per month). One missed lead is a $10,000 to $30,000 commission. Agents routinely receive texts at midnight, on weekends, and during showings when they cannot reply. The AI handles the initial qualification, books the showing, and updates the CRM — all before the agent checks their phone in the morning. Price based on the agent's average commission and close rate, not on the volume of conversations. A solo agent closing 3 deals per year can justify $1,500 per month if the AI recovers even half a deal.

Auto dealerships ($1,000 to $1,500 per month). High volume, high lead intent, and brutal response-time expectations. Studies consistently show that leads contacted within 5 minutes convert at 100× the rate of leads contacted after 30 minutes. Dealerships that route all lead sources (website, social, Google) into GHL and point the AI at the unified inbox see their speed-to-contact drop from hours to under 60 seconds. Price on volume: a dealership receiving 300 inbound messages per month should pay more than one receiving 50.

Cannabis dispensaries ($500 to $1,000 per month). Unique because they cannot advertise on most major platforms, so organic and direct-to-consumer messaging is critical. The AI handles menu questions, loyalty program inquiries, and "is my order ready" messages. Compliance requirements around age verification and product claims mean the knowledge base needs careful configuration — price slightly above the base rate to account for this additional setup work.

Restaurants ($300 to $600 per month). Lower average transaction value than most verticals, but high volume and high frequency. A restaurant that receives 80 texts per week about reservations, hours, and takeout menus benefits from the AI clearing those without staff intervention — but the math on a $500 retainer requires the AI to meaningfully reduce front-of-house labor cost, not just generate new bookings. Pitch this as a labor-reduction tool rather than a lead-recovery tool.

One rule that holds across all verticals: never price the retainer at less than the value of one recovered transaction. If a dental cleaning is $150, your minimum retainer is $150. If a real estate commission is $10,000, your minimum retainer is well above $1,000. Pricing below the value of a single outcome sets the wrong frame from the first invoice.

Frequently asked questions

Do I need technical skills to deploy AI workers?

No. The platform handles the infrastructure. You configure personality via a markdown file, connect the client's existing CRM with a token, and pick an industry template. If you can manage a GoHighLevel sub-account, you can deploy an AI worker.

What if a client already has a chatbot on their site?

Most site chatbots are keyword-based scripts. An AI worker replaces them entirely. The client removes the old chatbot script from their website and installs the AI worker embed code. The new AI has a conversation with visitors instead of a scripted Q-and-A.

How do I price this for my first few clients?

Start at the low end of the range. First three clients at $500 per month to build case studies, then raise to $800 to $1,500 for new clients once you have results to show. Do not underprice long-term; the margin supports it and the value is real.

How do I sell this to a client who's skeptical of AI?

Show, don't tell. Every agency account includes industry-specific demo pages (dental, real estate, auto, and others). Send the prospect the link, tell them to text anything a customer would text. Most skepticism disappears after a 3-minute live conversation with the AI.

What are the operational costs I should plan for?

Platform subscription (typically $99 to $499 per month depending on client count), plus AI model API costs (roughly $1 to $5 per client per month at moderate conversation volume). Both scale cleanly with client count. No per-message or per-conversation fees beyond the model API cost.

Can I offer this alongside my current services, or does it replace them?

Alongside. AI worker retainers sit on top of existing services (ads, SEO, GHL management). They often make the other services stickier because the client is getting measurable, ongoing value from the agency relationship.

When this business model isn't right for you

The AI worker retainer is a strong recurring-revenue play, but it's not universal. Skip this service line if:

  Your current clients are not in high-inbound-volume verticals (pure B2B consulting, for example, often doesn't fit).
  You have no interest in learning a new dashboard or maintaining a new service line.
  Your existing agency margin is already so high that adding this is noise.
  Your client base won't or can't pay monthly recurring fees.

For agencies serving local service businesses, e-commerce, real estate, or any high-volume consumer-facing vertical, this works.

Ready to start? Create your free agency account — no credit card required. For the deeper technical story on how AI workers differ from chatbots, read our 6 capabilities guide or our GHL AI worker complete guide.

External references: GoHighLevel documentation · OpenClaw documentation · OpenClaw on GitHub.

---

## Contact

- Email: angel@conversionsystem.com
- Website: https://conversionsystem.com