IA vocale·26 février 2026·10 min de lecture

Building Maideo: Voice AI Recruiting for Home Services

How we shipped a voice agent that pre-qualifies care workers — architecture, pitfalls, the prompt that finally worked, and the production numbers.

Frédéric Magnin

Founder & AI Engineer at Ikki

Building Maideo: Voice AI Recruiting for Home Services

The problem

Home-services agencies (childcare, elder care, cleaning) have a recruiting bottleneck. They get hundreds of candidate applications a month. Human recruiters spend 15 minutes per call doing the same five questions every time: availability, geography, certifications, prior experience, why-this-job.

That's hundreds of hours a month on a script.

The brief

Maideo asked us a simple question: can a voice agent handle the first call?

If yes, recruiters get pre-qualified leads, with structured data attached, and only call the candidates worth calling. If no, the agent fails gracefully and a human picks up.

We had eight weeks. Here's what we built.

Architecture overview

Candidate ─▶ phone number (Twilio inbound)
              ▼
            Twilio ─▶ ElevenLabs Conversational AI agent
              ▼                  │
            Voice                │ tool calls
              ▼                  ▼
      LLM (GPT-4o)         Maideo backend (Fastify)
              ▼                  │
      Custom orchestration ─────▶│
              ▼                  ▼
      Transcript + structured ─▶ MongoDB
                  ▼
            Recruiter dashboard (Nuxt)

The voice layer is ElevenLabs Conversational AI (chosen for French voice quality — see our comparison). The orchestration is custom Node.js because we needed deep integration with Maideo's existing CRM.

The agent design

The candidate flow is five questions in 5–7 minutes:

Availability — days, hours, weekends?
Geography — postcode + transport?
Certifications — CAP, BAFA, DEAP, none?
Experience — has done X before, how long?
Motivation — short open-ended

The agent has to:

Sound natural (not interrogation)
Handle interruptions and backtracks ("wait, I work weekends too")
Recover from off-topic ("can I get more info about salary?")
Detect end-of-call cues ("that's all, thanks")
Output structured JSON for the recruiter

The prompt that finally worked

Three iterations of the system prompt. The third one is what runs in production:

You are an AI recruiter for Maideo, a French home-services agency. Your job is to chat with a candidate for 5–7 minutes to gather five pieces of information: availability, geography, certifications, experience, and motivation.
Your tone: warm, professional, lightly informal. Like a kind colleague at the front desk. Not robotic. Not overly cheerful.
Your structure: ask one question at a time. Wait for a complete answer. Don't repeat the question if the answer is partial — instead, follow up naturally ("OK and how many hours per week, would you say?").
Off-topic handling: if the candidate asks about salary, schedule, or job specifics, say "I'll let the recruiter cover that on the next call — they'll have more details" and gently steer back.
Recovery: if the candidate corrects themselves ("actually I do work weekends"), update the corresponding field via the updateAnswer tool and confirm warmly.
End cue: when you have all five fields with reasonable confidence, summarize back ("So just to confirm: Mon-Fri evenings, 10th arrondissement, no certifications yet, two years of childcare, looking for a job near home"), thank them, and call the endCall tool.
Hard rules:
Never ask for personal data (date of birth, ID number) — the recruiter does that.
Never make promises about hire decisions, salary, or start dates.
If asked "are you a robot?", say yes warmly and keep going.

That last rule was the biggest win. Trying to hide the AI failed; being upfront about it built trust faster.

Tool calls — what the agent could do

We exposed three tools:

{
  updateAnswer: ({ field, value, confidence }) => {
    // field ∈ ['availability', 'geography', 'certifications', 'experience', 'motivation']
    // confidence ∈ [0, 1] — used to decide if we ask a clarification
    candidate.fields[field] = { value, confidence, updatedAt: now() }
  },

  flagForRecruiter: ({ reason }) => {
    // E.g. red flags ("I have a criminal record I want to discuss"),
    // or strong positives ("I'm a registered nurse and willing to work nights")
    candidate.flags.push({ reason, at: now() })
  },

  endCall: ({ summary }) => {
    candidate.status = 'qualified'
    candidate.summary = summary
    queueRecruiterNotification(candidate)
  },
}

The flagForRecruiter tool was added in week 6 after we saw the agent steamroll past genuinely concerning answers (a candidate disclosing a violent prior — the agent thanked them and moved on). Now those get flagged and the recruiter is paged immediately.

What broke in production

Six weeks after launch:

Background noise. Kids screaming, traffic, dogs barking. The agent transcribes the noise as words, gets confused, asks the candidate to repeat. We added a noise gate at the audio layer (RNNoise) and tuned the silence-detection threshold.

Code-switching French/Arabic/Wolof. Candidates in immigrant neighborhoods naturally code-switch. Whisper handles this OK but the LLM sometimes thinks the candidate is non-fluent in French (when they actually are, just multilingual). We added a system prompt note: "candidates may code-switch between French, Arabic, and other languages — treat all as fluent French".

The hostile candidate. Twice we had candidates who were aggressive (not the agent's fault — bad day, frustration with another agency). The agent kept calmly answering. Better than any of us would have. But we added a endCallEarly tool for these cases — if the candidate says "this is bullshit" or similar, the agent ends warmly, doesn't push.

The talkative candidate. Some candidates wanted to chat for 20+ minutes. The agent would let them. We added a soft cap at 8 minutes with a polite redirect.

The numbers

After three months in production:

62% of candidates complete the qualification call (up from 38% in week 1)
8 minutes average call length (down from 11 minutes in week 1)
€0.42 per call all-in (voice + LLM + telephony)
15 minutes of recruiter time saved per qualified candidate
8.4× ROI on the agent vs human pre-qualification

Recruiters get a Slack notification when a candidate qualifies, with the structured summary and any flags. They click into the dashboard, review, and decide whether to call the candidate themselves.

What we'd do differently

In hindsight, three things:

Build the eval set with recruiters from week 1. We built it ourselves and we missed obvious cases that experienced recruiters spotted instantly.
Ship the dashboard before the agent. We shipped them in parallel. The dashboard was rough at launch and the recruiters didn't trust the qualifications. Two more weeks of dashboard polish would have accelerated adoption.
Plan the human-in-the-loop early. We treated the agent as autonomous. In practice, the most valuable mode is "agent does pre-qualification, human reviews, escalates if needed". Designing that flow upfront would have saved a refactor.

Stack summary

Voice: ElevenLabs Conversational AI (FR voice)
STT inside ElevenLabs pipeline (Whisper-equivalent quality on French)
LLM: GPT-4o (we tested Claude Sonnet, comparable quality, slightly higher latency)
Backend: Fastify, MongoDB, Redis, BullMQ
Frontend: Nuxt 4 (recruiter dashboard)
Telephony: Twilio (FR inbound numbers)
Observability: Posthog (transcripts + replay), custom dashboard for daily cost/quality
Audio cleanup: RNNoise at the Twilio edge

Closing thoughts

Voice AI for recruiting works in 2026 — but only if you treat it as one piece of a recruiter workflow, not a replacement. The agent does the boring 80%; the recruiter does the human 20%.

If you want to ship something similar — voice agent for screening, intake, scheduling, support — get in touch. We did the hard part once. We can do it for you faster.

Travailler avec Ikki

Besoin d'aide pour livrer ça en production ?

On conçoit, livre et opère des systèmes IA pour PME et entreprises. Agents vocaux, RAG, automatisation, web & mobile.

Démarrer un projet Voir nos réalisations

Autres articles

IA vocale

ElevenLabs vs Vapi vs Retell — Voice AI Platform Comparison 2026

Side-by-side comparison of the three leading voice AI platforms in 2026 — latency, languages, pricing, integrations, and what we ship in production at Ikki.

RAG

RAG Implementation Guide for SMBs (2026)

How to ship a Retrieval-Augmented Generation system that actually works for SMBs — chunking, embeddings, evaluation, and the mistakes that cost us six weeks.