Building Maideo: Voice AI Recruiting for Home Services
How we shipped a voice agent that pre-qualifies care workers — architecture, pitfalls, the prompt that finally worked, and the production numbers.

The problem
Home-services agencies (childcare, elder care, cleaning) have a recruiting bottleneck. They get hundreds of candidate applications a month. Human recruiters spend 15 minutes per call doing the same five questions every time: availability, geography, certifications, prior experience, why-this-job.
That's hundreds of hours a month on a script.
The brief
Maideo asked us a simple question: can a voice agent handle the first call?
If yes, recruiters get pre-qualified leads, with structured data attached, and only call the candidates worth calling. If no, the agent fails gracefully and a human picks up.
We had eight weeks. Here's what we built.
Architecture overview
Candidate ─▶ phone number (Twilio inbound)
▼
Twilio ─▶ ElevenLabs Conversational AI agent
▼ │
Voice │ tool calls
▼ ▼
LLM (GPT-4o) Maideo backend (Fastify)
▼ │
Custom orchestration ─────▶│
▼ ▼
Transcript + structured ─▶ MongoDB
▼
Recruiter dashboard (Nuxt)
The voice layer is ElevenLabs Conversational AI (chosen for French voice quality — see our comparison). The orchestration is custom Node.js because we needed deep integration with Maideo's existing CRM.
The agent design
The candidate flow is five questions in 5–7 minutes:
- Availability — days, hours, weekends?
- Geography — postcode + transport?
- Certifications — CAP, BAFA, DEAP, none?
- Experience — has done X before, how long?
- Motivation — short open-ended
The agent has to:
- Sound natural (not interrogation)
- Handle interruptions and backtracks ("wait, I work weekends too")
- Recover from off-topic ("can I get more info about salary?")
- Detect end-of-call cues ("that's all, thanks")
- Output structured JSON for the recruiter
The prompt that finally worked
Three iterations of the system prompt. The third one is what runs in production:
You are an AI recruiter for Maideo, a French home-services agency. Your job is to chat with a candidate for 5–7 minutes to gather five pieces of information: availability, geography, certifications, experience, and motivation.
Your tone: warm, professional, lightly informal. Like a kind colleague at the front desk. Not robotic. Not overly cheerful.
Your structure: ask one question at a time. Wait for a complete answer. Don't repeat the question if the answer is partial — instead, follow up naturally ("OK and how many hours per week, would you say?").
Off-topic handling: if the candidate asks about salary, schedule, or job specifics, say "I'll let the recruiter cover that on the next call — they'll have more details" and gently steer back.
Recovery: if the candidate corrects themselves ("actually I do work weekends"), update the corresponding field via the
updateAnswertool and confirm warmly.End cue: when you have all five fields with reasonable confidence, summarize back ("So just to confirm: Mon-Fri evenings, 10th arrondissement, no certifications yet, two years of childcare, looking for a job near home"), thank them, and call the
endCalltool.Hard rules:
- Never ask for personal data (date of birth, ID number) — the recruiter does that.
- Never make promises about hire decisions, salary, or start dates.
- If asked "are you a robot?", say yes warmly and keep going.
That last rule was the biggest win. Trying to hide the AI failed; being upfront about it built trust faster.
Tool calls — what the agent could do
We exposed three tools:
{
updateAnswer: ({ field, value, confidence }) => {
// field ∈ ['availability', 'geography', 'certifications', 'experience', 'motivation']
// confidence ∈ [0, 1] — used to decide if we ask a clarification
candidate.fields[field] = { value, confidence, updatedAt: now() }
},
flagForRecruiter: ({ reason }) => {
// E.g. red flags ("I have a criminal record I want to discuss"),
// or strong positives ("I'm a registered nurse and willing to work nights")
candidate.flags.push({ reason, at: now() })
},
endCall: ({ summary }) => {
candidate.status = 'qualified'
candidate.summary = summary
queueRecruiterNotification(candidate)
},
}
The flagForRecruiter tool was added in week 6 after we saw the agent steamroll past genuinely concerning answers (a candidate disclosing a violent prior — the agent thanked them and moved on). Now those get flagged and the recruiter is paged immediately.
What broke in production
Six weeks after launch:
Background noise. Kids screaming, traffic, dogs barking. The agent transcribes the noise as words, gets confused, asks the candidate to repeat. We added a noise gate at the audio layer (RNNoise) and tuned the silence-detection threshold.
Code-switching French/Arabic/Wolof. Candidates in immigrant neighborhoods naturally code-switch. Whisper handles this OK but the LLM sometimes thinks the candidate is non-fluent in French (when they actually are, just multilingual). We added a system prompt note: "candidates may code-switch between French, Arabic, and other languages — treat all as fluent French".
The hostile candidate. Twice we had candidates who were aggressive (not the agent's fault — bad day, frustration with another agency). The agent kept calmly answering. Better than any of us would have. But we added a endCallEarly tool for these cases — if the candidate says "this is bullshit" or similar, the agent ends warmly, doesn't push.
The talkative candidate. Some candidates wanted to chat for 20+ minutes. The agent would let them. We added a soft cap at 8 minutes with a polite redirect.
The numbers
After three months in production:
- 62% of candidates complete the qualification call (up from 38% in week 1)
- 8 minutes average call length (down from 11 minutes in week 1)
- €0.42 per call all-in (voice + LLM + telephony)
- 15 minutes of recruiter time saved per qualified candidate
- 8.4× ROI on the agent vs human pre-qualification
Recruiters get a Slack notification when a candidate qualifies, with the structured summary and any flags. They click into the dashboard, review, and decide whether to call the candidate themselves.
What we'd do differently
In hindsight, three things:
- Build the eval set with recruiters from week 1. We built it ourselves and we missed obvious cases that experienced recruiters spotted instantly.
- Ship the dashboard before the agent. We shipped them in parallel. The dashboard was rough at launch and the recruiters didn't trust the qualifications. Two more weeks of dashboard polish would have accelerated adoption.
- Plan the human-in-the-loop early. We treated the agent as autonomous. In practice, the most valuable mode is "agent does pre-qualification, human reviews, escalates if needed". Designing that flow upfront would have saved a refactor.
Stack summary
- Voice: ElevenLabs Conversational AI (FR voice)
- STT inside ElevenLabs pipeline (Whisper-equivalent quality on French)
- LLM: GPT-4o (we tested Claude Sonnet, comparable quality, slightly higher latency)
- Backend: Fastify, MongoDB, Redis, BullMQ
- Frontend: Nuxt 4 (recruiter dashboard)
- Telephony: Twilio (FR inbound numbers)
- Observability: Posthog (transcripts + replay), custom dashboard for daily cost/quality
- Audio cleanup: RNNoise at the Twilio edge
Closing thoughts
Voice AI for recruiting works in 2026 — but only if you treat it as one piece of a recruiter workflow, not a replacement. The agent does the boring 80%; the recruiter does the human 20%.
If you want to ship something similar — voice agent for screening, intake, scheduling, support — get in touch. We did the hard part once. We can do it for you faster.
Travailler avec Ikki
Besoin d'aide pour livrer ça en production ?
On conçoit, livre et opère des systèmes IA pour PME et entreprises. Agents vocaux, RAG, automatisation, web & mobile.
Autres articles
ElevenLabs vs Vapi vs Retell — Voice AI Platform Comparison 2026
Side-by-side comparison of the three leading voice AI platforms in 2026 — latency, languages, pricing, integrations, and what we ship in production at Ikki.
RAGRAG Implementation Guide for SMBs (2026)
How to ship a Retrieval-Augmented Generation system that actually works for SMBs — chunking, embeddings, evaluation, and the mistakes that cost us six weeks.