Tous les articles
Leçons··11 min de lecture

Lessons from Shipping Seven AI Products

What we learned shipping voice agents, RAG platforms, fintech engines, civic AI, and immersive web — the patterns that worked, the ones that didn't, and the things nobody told us.

Frédéric Magnin
Founder & AI Engineer at Ikki
Lessons from Shipping Seven AI Products

Why this post

Over the last 24 months we've shipped seven AI products to production. They span fintech, civic tech, home services SaaS, semantic video search, immersive sci-fi, autonomous trading, and event PWAs.

This post is what we wish we'd known before product 1. It's biased toward what surprised us — the patterns that worked across all seven, and the ones that broke when we tried to copy them blindly.

The seven

For context — this is what we shipped, in rough order:

  1. Codemachia — transmedia sci-fi universe with Three.js + generative AI
  2. Opportunix — RAG platform for SMBs analyzing public tenders
  3. Maideo — home-services SaaS with voice AI for recruiting
  4. Footfoot — semantic video search for football moments
  5. Democratie — civic AI for the 2026 Paris municipal elections
  6. The Boom — multi-tenant PWA for private events
  7. ikki.finance — autonomous quantitative trading engine

They look unrelated. They're not. The same five lessons applied to all of them.

Lesson 1: the AI part is rarely the hard part

When we started, we assumed shipping AI products meant deep ML problems — model selection, fine-tuning, eval. In reality, the AI is 15-25% of the work. The rest is:

  • Auth, billing, multi-tenancy, RBAC
  • Data pipelines (ingest, normalize, version)
  • Observability (logs, traces, eval drift, cost monitoring)
  • UI/UX for a product whose behavior is non-deterministic
  • Customer onboarding and guardrails
  • Compliance (GDPR for EU, sector-specific for fintech and civic)

Most "AI startups" ship AI demos. To ship AI products, the surrounding 75% has to be there. We've watched competitors with much fancier models lose to teams with simpler models and better infrastructure.

Lesson 2: defaults matter more than capabilities

For Maideo's voice agent, we spent two weeks tuning the LLM prompt. We then realized 80% of the conversation quality came from one decision: the agent's first sentence.

Same on Opportunix: the chunking strategy mattered more than the embedding model.

Same on ikki.finance: the position sizing default mattered more than the prediction model.

The lesson: in AI products, defaults are the product. Picking sensible defaults and letting users override them is almost always better than asking them to configure 14 things up front.

Lesson 3: build the eval before the feature

This was a hard lesson. On Opportunix, we shipped the RAG system, demoed it, and then had no way to know if it was getting better or worse over time. We added eval six weeks in. By then, we'd made changes we couldn't measure.

Now, on every AI product, we build the eval first:

  • A small (50-100 example) curated dataset
  • A scoring function (LLM-as-judge or rule-based)
  • A CI hook that runs eval on every PR

Without this, you're flying blind. Cost: 1-2 days. Value: every subsequent decision is informed.

Lesson 4: production traffic teaches you everything

We had a synthetic test set for Maideo's voice agent. It scored 92%. We launched. The first day, real candidates broke it in ways we hadn't anticipated:

  • Background noise (kids, traffic, dogs)
  • Strong regional accents we hadn't included in the test set
  • Users who interrupted constantly
  • Users who answered questions before they were asked

The fix wasn't a better model. It was better production logging so we could see what was actually happening, then iterating on edge cases week by week. Six weeks after launch, the agent was at 96% completion rate. That gain came from real traffic, not synthetic eval.

Tooling we use now: Posthog for transcripts + replay, custom dashboards for cost and latency drift, manual review of 10 random calls per week.

Lesson 5: the cost surprise is always voice

Of the seven products, ikki.finance has the most LLM calls per minute (autonomous trading, lots of analysis). Maideo's voice agent has 1/100th the LLM calls. Maideo costs 3x more per active user.

Why? Voice. Voice synthesis is 10-100× more expensive than the LLM brain. We talked about this in our voice AI cost guide.

Lesson: model your unit economics carefully if voice is in the loop. The LLM is rarely the cost driver — voice and telephony are.

Lesson 6: pick a stack and reuse it

For our first three products we picked technologies à la carte. Different frontend (React then Vue), different backend (Express then Fastify), different DBs.

The result: every onboarding ramp was three weeks. Code didn't transfer between projects. Bug fixes didn't compound.

From product 4 onwards we picked a stack and stuck to it: Nuxt 4 + Fastify + MongoDB + BullMQ + Redis + Vercel/DO. We wrote shared utilities (auth middleware, multi-tenancy, billing, observability hooks) once, copied them across products.

This is unfashionable advice. The right framework choice depends on the team. But the underlying principle — stop optimizing per-project, start optimizing across-projects — is the highest-leverage decision an agency can make.

Lesson 7: the AI feature is the easy lock-in story to tell, but it's almost never the moat

We've pitched clients "you'll have an AI moat." We were wrong every time.

Real moats we've watched build up:

  • Data ownership: a year of clean labeled data is harder to copy than a model
  • Workflow integration: once the agent is wired into your CRM, telephony, and back office, switching cost is high
  • Trust: SLAs and uptime track records compound

The AI part is replaceable. The everything-around-it isn't. We now sell the integration, not the model.

Lesson 8: ship behind a flag, always

Every AI feature we ship goes behind a flag (LaunchDarkly, GrowthBook, or homebrew). Reasons:

  • Bad output discovered in prod → flip the flag, no rollback
  • A/B test the AI version vs the rule-based fallback → measure the lift
  • Roll out to 1% of traffic for cost / quality monitoring before going wide
  • Sales conversations: "we shipped it last week, currently in beta with our top customers"

This single practice has saved us six times. It's table stakes.

Lesson 9: dual-write the data layer

For four products, we use LLMs in the write path: an agent processes input, structures it, and writes to the DB. The temptation is to trust the LLM output.

Don't. We always dual-write: store both the raw input AND the LLM-structured output, with a version. When (not if) the prompt changes and the structure shifts, we can re-run on raw data without losing history.

This adds 10% to storage. It's worth it 100% of the time.

Lesson 10: the team you ship with matters more than the model

We've shipped products with junior teams, with senior teams, with mixed teams. The single biggest predictor of project success was not the model, framework, or budget — it was whether the team had someone who'd shipped before.

Specifically: someone who'd seen a system through launch, scale, and sunset. They know which corners are safe to cut and which ones break later. That person is the difference between a 6-week project and a 6-month one.

Hire for that. Invest in growing it.

Closing thoughts

If you're shipping your first AI product, the temptation is to spend 80% of your time on the AI. Resist it.

Spend 25% on the AI, 25% on the eval, 25% on the surrounding infrastructure, and 25% on listening to the first 100 real users. That's how products ship.

If you'd like help applying these lessons to your project — get in touch. We've made these mistakes so you don't have to.


Travailler avec Ikki

Besoin d'aide pour livrer ça en production ?

On conçoit, livre et opère des systèmes IA pour PME et entreprises. Agents vocaux, RAG, automatisation, web & mobile.

Autres articles

SHIP LOG

SHIP-0247·CODEMACHIA·v1.4.22026-05-04 02:21 UTC