All articles
Platforms··12 min read

Multi-Tenancy Patterns for AI SaaS — Org Isolation, Quotas, Billing

Most SaaS products try to be multi-tenant from day one. Most get it wrong. Here are the patterns that actually ship — org isolation, per-org quotas, role-based collaboration, and the migration to schema-per-tenant when you outgrow shared DB.

Ikki
Last verified · May 9, 2026
Multi-Tenancy Patterns for AI SaaS — Org Isolation, Quotas, Billing

Why multi-tenancy is harder than it looks

The mental model most teams start with is "build the product for one user, then add org_id to every table". This works until it doesn't. The day a customer can see another customer's data is the day your enterprise sales pipeline dies.

Multi-tenancy in SaaS — and especially in AI SaaS, where artifacts (uploaded docs, generated outputs, vector embeddings, prompt logs) are huge and sensitive — has three failure modes:

  1. Leaky reads — a query forgets the org_id filter and returns data from another tenant.
  2. Leaky writes — a write operation lands the artifact under the wrong tenant's resource.
  3. Hidden cross-tenant coupling — a "shared resource" model that started innocent (a global tags table, a shared opportunity feed) ends up being the surface where two tenants can stomp on each other.

We've shipped several multi-tenant AI products. The patterns below are the ones that survived, plus the failures we won't repeat.

The three multi-tenancy modes

You'll pick one of three, and the choice shapes years of code. Each has a different trade-off:

ModeIsolationCostMigration costWhen to pick
Shared DB + org_idLogicalCheapestHardest laterDefault for SaaS up to ~hundreds of orgs
Schema-per-tenantStrongModerateModerateCompliance-driven, heavy data per tenant
DB-per-tenantStrongestHighestTrivial (tenant has own DB)Regulated industries, enterprise self-host

We start every new product on shared DB + org_id unless we know on day one we'll need stronger isolation (regulated finance, healthcare, government contracts). The migration to schema-per-tenant is real work but it's a problem we'd rather have than the over-engineered alternative.

The org isolation invariant

The single most important rule: every read and every write that touches tenant-owned data must be filtered by org_id. No exceptions, no "it's an internal endpoint", no "the controller already checked".

The reason: any single missed filter is a vulnerability. And humans miss filters. Code review catches some; the rest you catch with a structural pattern.

Three levels of defense:

Level 1 — A typed access helper

Don't query the model directly from controllers. Go through a helper that requires the user/org context:

// ❌ unsafe — easy to forget the filter
const job = await Job.findById(jobId)

// ✅ safe — context required, filter applied at one place
const job = await assertCanAccessJob(jobId, currentUser)

assertCanAccessJob looks up the job, verifies that the user is a member of the job's org (or a global admin), and either returns the job or throws a ForbiddenError. Every call site that wants to access a job goes through this function.

The trade-off: you can't easily do Job.find() from a controller anymore — you have to think about which org. That's exactly the friction you want.

Level 2 — Per-resource ACL functions

Every resource type (Job, Document, Embedding, Conversation) has its own assertCanAccess<Resource> function. They all share the same shape:

async function assertCanAccessDocument(
  documentId: string,
  user: AuthenticatedUser,
): Promise<Document> {
  const doc = await Document.findById(documentId).lean()
  if (!doc) throw new NotFoundError('document', documentId)
  
  // Global ADMIN/OWNER can access anything for support
  if (isGlobalAdmin(user)) return doc
  
  // Otherwise must be a member of the doc's org
  const isMember = user.organizations.some(o => o.organizationId === doc.orgId)
  if (!isMember) throw new ForbiddenError('document', documentId)
  
  return doc
}

Three properties to enforce in every helper:

  • 404 vs 403 — when the user isn't a member, return 404 (not 403). 403 leaks the existence of the resource. We use 404 unless the resource is intentionally public.
  • Global admin bypass for support — your support team needs to access tenant data sometimes. Build the bypass explicitly, log every access, audit it.
  • Tenant role check, not just membership — sometimes the helper should also check that the member has at least a certain role within the org (e.g. only ADMINs can access billing data).

Level 3 — Don't share artifacts on resource models

The deepest failure mode is when an artifact is on a resource that looks shared. We've shipped this bug. It looked like:

// Public-ish opportunity feed
const Opportunity = new Schema({
  publicReference: String,
  title: String,
  // ...
  // ⚠️ added later, scoped by mistake to the "current user's run"
  uploadedDocuments: [{ ... }],
  generatedDraft: { ... },
  qualityScore: Number,
})

The opportunity itself is public (a tender notice, a published listing). But the moment a user uploads docs for that opportunity, those docs become per-tenant. Putting them on the shared Opportunity document means tenant A's draft can be returned to tenant B if any code path reads Opportunity.generatedDraft without re-scoping.

The fix is to always model per-tenant artifacts on a per-tenant resource:

const Opportunity = new Schema({
  publicReference: String,
  title: String,
  // ... only public fields
})

const Job = new Schema({
  orgId: { type: ObjectId, required: true, index: true },
  opportunityId: ObjectId,
  // per-tenant artifacts live here, scoped to one org
  uploadedDocuments: [{ ... }],
  generatedDraft: { ... },
  qualityScore: Number,
})

Now the access path is Opportunity (public read) → Job (per-org, ACL-checked). The artifact-fetch always goes through the job, which has an orgId, which is verified.

This refactor is painful when you discover it three months in. Build the per-tenant model from day one, even if it feels like over-modeling.

Per-org quotas

AI SaaS quickly grows usage-based costs (LLM calls, voice minutes, storage). Per-org quotas keep cost predictable and prevent one runaway tenant from breaking the budget.

Two patterns we use:

Soft quotas — pre-call check, post-call decrement

For LLM calls and voice minutes:

async function consumeCredits(orgId: string, amount: number) {
  const result = await Org.findOneAndUpdate(
    { _id: orgId, credits: { $gte: amount } },
    { $inc: { credits: -amount } },
    { new: true },
  )
  if (!result) throw new InsufficientCreditsError({ orgId, amount })
  return result.credits
}

The atomic $gte check + $inc is the key. Without it, two concurrent requests can both pass the check and both decrement, ending up with a negative balance. With it, the pattern is race-safe.

For high-throughput counters (per-minute API rate limits, request bursts), Redis with INCR + EXPIRE is faster and cheaper. We use Redis for the rate-limit layer and MongoDB for the persisted credit balance.

Hard quotas — fail fast at the edge

Some quotas should kill the request before any work happens:

  • File-size limits on uploads (reject at the multipart parser)
  • Maximum prompt length (reject at the validator)
  • Maximum concurrent jobs per org (reject at the queue submission)

Hard quotas live in a middleware that runs before the controller. Soft quotas live in the service layer, around the actual cost-incurring action.

Roles within an org

Beyond "is this user a member of this org?", real SaaS needs role-based access control:

  • OWNER — billing, can promote others to OWNER, can delete the org
  • ADMIN — can invite/remove members, change roles (but not promote to OWNER), full data access
  • MEMBER — full data access, no admin powers
  • (Optional) VIEWER — read-only, for stakeholder access

Plus an orthogonal axis: global roles (ADMIN, OWNER on the entire SaaS, used by your support team). Don't conflate these. We use:

  • User.userRole — global SaaS role (USER / ADMIN / OWNER)
  • User.organizations[].role — org-level role (OWNER / ADMIN / MEMBER)

Most enforcement uses the org-level role. The global role is only used for the support bypass and the admin console.

The last-OWNER guard

A constraint that's easy to miss: when a user changes another user's role or removes them, you must prevent the operation that would leave the org with zero OWNERs.

async function changeMemberRole(orgId, targetUserId, newRole, actor) {
  if (newRole !== 'OWNER') {
    // Check we're not removing the last OWNER
    const ownerCount = await User.countDocuments({
      'organizations.organizationId': orgId,
      'organizations.role': 'OWNER',
    })
    const target = await User.findById(targetUserId).lean()
    const targetIsOwner = target.organizations.find(o => 
      o.organizationId.equals(orgId) && o.role === 'OWNER'
    )
    if (targetIsOwner && ownerCount <= 1) {
      throw new LastOwnerError()
    }
  }
  // ... apply the change
}

The same guard applies to member deletion. This sounds trivial but the bug — an org locked out of itself with no admin — has bitten every team that didn't write the guard upfront.

Invitation flow

Inviting by email is the canonical entry point. The flow that ships:

  1. Admin clicks "invite", types email + role
  2. Backend creates an OrgInvitation { orgId, email, role, token (24 random bytes), invitedBy, expiresAt: now + 7 days, status: 'pending' }
  3. Email sent with link /signup?invite=<token>
  4. New user signs up; the signup flow claims the invitation atomically with findOneAndUpdate({ token, status: 'pending' }, { status: 'accepted' }) and pushes the user into org.members[]
  5. Existing user clicks the link; same atomic claim, just attaching the existing user to the new org

Three details that come back to bite you:

  • TTL on pending invitations — Mongo TTL index with partialFilterExpression: { status: 'pending' } so accepted invitations are kept for audit but pending ones auto-expire.
  • Public lookup endpoint — the signup page needs to fetch the org name + role from the token without auth. Make this endpoint explicitly public, gate it behind the token (which is a 24-byte random secret), and don't leak any other org info.
  • Resend webhook reliability — if Resend fails to send the email, roll back the invitation. Don't leave the user with an invitation token they never received and an admin's UI showing "invited".

Billing — three models that work

For AI SaaS in 2026, three billing models cover almost everything:

1. Per-seat (classic SaaS)

X € / month / user. Simple, predictable for the customer, easy to bill via Stripe Subscription. Works when the per-user value is roughly constant. Falls apart when usage varies wildly between users — heavy users subsidized by light users until the heavy users churn.

2. Usage-based (per call, per minute, per token)

Pay for what you consume. Right when usage and cost are highly correlated (voice minutes, LLM calls). Tricky to bill (Stripe Metered Billing or homebrew billing cycles). Customers hate uncertainty in their monthly bill — counter this with monthly caps and clear dashboards.

3. Per-action (per job, per dossier, per generated artifact)

X € per completed job, max N concurrent, refund on failure. Simple to communicate ("you pay only for successful outcomes"), aligns vendor and customer incentives, easy to explain to procurement. The right model for AI products where each "job" is a discrete unit of value (a generated proposal, a translated document, a processed image batch).

The pattern that pays back: idempotent refunds on failure.

async function refundJobIfFailed(jobId: string) {
  const job = await Job.findOneAndUpdate(
    { _id: jobId, status: 'failed', refundedAt: { $exists: false } },
    { $set: { refundedAt: new Date() } },
  )
  if (!job) return // already refunded or not failed
  await consumeCredits(job.orgId, -job.cost) // negative = refund
  await audit('refund_processed', { jobId, amount: job.cost })
}

The atomic update on refundedAt makes the function idempotent: replays don't double-refund. This is the kind of code that has to be right the first time — recovering from "we accidentally double-refunded 200 customers" is an entire afternoon you don't want.

Audit and activity logs

Once multiple users collaborate on the same resource, "who did what" matters. We build a per-tenant ActivityLog from day one:

const ActivityLog = new Schema({
  orgId: { type: ObjectId, required: true, index: true },
  resourceType: String, // 'job', 'document', 'member', 'invitation'
  resourceId: ObjectId,
  actorUserId: ObjectId,
  action: String, // 'created', 'updated', 'role_changed', 'deleted'
  metadata: Schema.Types.Mixed, // before/after, IP, user-agent
  createdAt: { type: Date, default: Date.now, index: true },
})

Two design choices that survive:

  • Append-only. Never update or delete an activity log entry. If you rebuild the log shape, write a migration to a v2 collection; don't mutate v1.
  • Per-tenant TTL is sometimes wanted (legal retention rules), but default to "keep forever". Storage is cheap. Lost audit context is expensive.

When shared DB breaks

The patterns above scale to roughly hundreds of orgs and millions of records per resource type. Past that, you start hitting:

  • Index sizes that don't fit in RAM
  • Backup/restore times measured in hours
  • Cross-tenant noisy-neighbor effects (one big tenant slows everyone)
  • Compliance asks: "Show me my data, only my data, prove it's separate"

The escape hatches, in order of cost:

1. Sharded shared DB. Same logical model, sharded by orgId. MongoDB Atlas, Postgres Citus. Buys you another order of magnitude on volume.

2. Schema-per-tenant. Each tenant gets their own MongoDB database (still on the same cluster). Migrations run per-tenant. Strong logical isolation. Heavier admin burden — every schema change is N migrations.

3. DB-per-tenant. The tenant gets their own cluster. Often required for self-hosted enterprise deals. Operationally expensive, but the only way to deliver "your data is in your VPC" credibly.

The decision usually maps to the business model. If you're SMB-focused with hundreds of orgs averaging 10MB each, stay on shared DB. If you're enterprise with five customers each running tens of GB of regulated data, schema- or DB-per-tenant is what they want to buy.

Closing thoughts

Multi-tenancy is a thing you have to get right early because the cost of fixing it later compounds. The patterns here are the ones we'd implement on day one of any new SaaS product, AI or not:

  • Shared DB + org_id to start
  • ACL helpers as the only path to per-tenant resources
  • Per-tenant artifacts on per-tenant models, never bolted onto shared resources
  • Two-axis roles: global SaaS + per-org
  • Last-OWNER guard from day one
  • Activity log from day one
  • Quota enforcement at the right layer (hard at the edge, soft in the service)

If you've shipped a SaaS that's drifting toward leaky multi-tenancy and you're not sure where the holes are, get in touch. A four-hour audit usually surfaces the top three risks before they bite.


Work with Ikki

Scaling from B2C to multi-tenant B2B?

We architect multi-tenant AI SaaS with org isolation, quotas, billing and audit logs — production-ready from day one.

More articles

SHIP LOG

SHIP-0247·CODEMACHIA·v1.4.22026-05-22 10:17 UTC