FastAPI vs Express: Which Backend Should You Choose for Your AI App?

Q: Can FastAPI handle 10,000 concurrent users?

Yes. FastAPI runs on uvicorn, an ASGI server, which handles concurrent requests efficiently through Python's asyncio event loop. For 10,000 concurrent users, you'd typically run uvicorn behind gunicorn with multiple worker processes — a common pattern is (2 × CPU cores) + 1 workers. On a 4-core machine, that's 9 workers. For LLM-heavy endpoints where each request waits several seconds for an API response, async endpoints are essential — they release the event loop while waiting, allowing other requests to proceed. Horizontal scaling adds more instances behind a load balancer. On AWS ap-south-1 (Mumbai), a c6i.xlarge (4 vCPU, 8 GB RAM) at roughly ₹6,500/month handles substantial production traffic. Use gunicorn with uvicorn workers and monitor with Prometheus + Grafana.

Q: What's the best ORM for FastAPI when building an Indian SaaS?

SQLModel is the most ergonomic choice when building with FastAPI — it's created by the same author (Sebastián Ramírez) and shares Pydantic models for both database schemas and API validation, eliminating the duplication of defining the same data structure twice. For more complex queries and full control over SQL, SQLAlchemy 2.0 with its new asyncio support and the asyncpg driver for PostgreSQL is the production-standard choice. For database migrations, Alembic is the standard tool regardless of whether you use SQLModel or SQLAlchemy directly — it generates migration scripts from model changes and applies them safely. For an Indian SaaS handling GST invoicing or multi-tenancy across states, SQLAlchemy's flexibility with raw SQL for complex reporting queries is worth the additional setup over SQLModel.

February 6, 2026 14 min read Rajesh R Nair

FastAPIExpress.js

FastAPI vs Express: Which Backend Should You Choose for Your AI App?

FastAPI Python and Express.js Node.js backend comparison for AI application development

Every few months this debate resurfaces in developer communities: Python or JavaScript for the backend? The question sharpens considerably when you're building an AI application, because both sides suddenly have genuine, non-trivial arguments. This isn't a framework preference discussion — it's a question about where your application's complexity actually lives, and which language's ecosystem handles that complexity with less friction.

The Decision Context

You're building an AI application. It calls LLMs, processes some data, and serves responses to a Next.js or React frontend. Before you pick a framework, answer two questions precisely.

First: where does your AI logic actually run? If the answer is "we call the Anthropic API and return the response," then your backend is essentially a proxy with auth and rate limiting — language choice barely matters. If the answer is "we run a fine-tuned model locally," "we preprocess documents with pandas before chunking them for RAG," or "we use a scikit-learn classifier to route queries," then language choice matters enormously.

Second: who is building and maintaining this? A team of JavaScript developers comfortable with TypeScript will be more productive in Express than in FastAPI, regardless of FastAPI's technical merits. The best framework is the one your team can debug at 2am during an outage.

FastAPI: Why Python Wins for AI Backends

FastAPI's advantage over Express isn't syntax or speed — it's the Python AI ecosystem that surrounds it. LangChain, LlamaIndex, Hugging Face Transformers, sentence-transformers, FAISS, ChromaDB, scikit-learn, pandas, NumPy — all of these are Python-first libraries. When you use them in a FastAPI endpoint, there's no serialisation layer, no subprocess, no bridge to manage. You call the library directly and return the result.

Here's what this looks like in practice. A RAG (Retrieval-Augmented Generation) pipeline might use sentence-transformers to embed a query, query a FAISS index to retrieve relevant document chunks, then pass those chunks to an LLM. In FastAPI:

from fastapi import FastAPI
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np

app = FastAPI()
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
index = faiss.read_index('document_index.faiss')

@app.post("/retrieve")
async def retrieve_chunks(query: str, top_k: int = 5):
    embedding = model.encode([query])
    distances, indices = index.search(np.array(embedding).astype('float32'), top_k)
    return {"indices": indices[0].tolist(), "distances": distances[0].tolist()}

This runs the embedding model in-process. No HTTP call to a separate Python service, no JSON serialisation of embeddings, no separate container to orchestrate. You load the model once at startup and reuse it across requests.

FastAPI also has async support via Python's asyncio. async def endpoints release the event loop while waiting on I/O — LLM API calls, database queries — allowing uvicorn to serve other requests concurrently. Pydantic, which FastAPI uses natively for request and response validation, produces structured output from LLMs naturally because Pydantic models describe the exact JSON schema you want, and you can pass that schema to the LLM's structured output feature.

Express.js: Why JavaScript Wins for Product Backends

Express's advantages aren't about AI — they're about everything else in an application. Authentication with Passport.js or better-auth, payment integration with the Razorpay Node.js SDK (which is more complete than the Python SDK), email with Resend or Nodemailer, queues with BullMQ, and real-time features with Socket.IO are all JavaScript-first.

The Vercel AI SDK is the clearest example of where JavaScript wins for LLM-heavy apps. It handles streaming responses from multiple LLM providers, structured output with Zod schemas, and tool use with a unified API across Claude, GPT-4o, and Gemini. If your AI logic is primarily LLM API calls — not local model inference — this SDK is one of the best developer experiences available, and it's TypeScript-native.

// Express route using Vercel AI SDK
import express from 'express';
import { generateText } from 'ai';
import { anthropic } from '@ai-sdk/anthropic';

const router = express.Router();

router.post('/chat', async (req, res) => {
  const { messages } = req.body;
  const { text } = await generateText({
    model: anthropic('claude-sonnet-4-6'),
    messages,
    system: 'You are a helpful assistant for Kerala tourism queries.',
  });
  res.json({ response: text });
});

For a full-stack team building with Next.js on the frontend, Express on the backend means one language, one type system (TypeScript), shared types between frontend and backend, and a dramatically smaller mental context switch. Monorepos with shared packages between Next.js and Express work cleanly in TypeScript.

Performance Comparison

Raw benchmarks show FastAPI at roughly 40,000 requests per second and Express at roughly 30,000 for simple JSON responses on a single core. Both numbers are theoretical maximums that your application will never approach in practice, because LLM API calls take 1–10 seconds each, database queries add 5–50 milliseconds, and your bottleneck is almost never the framework's request handling overhead.

The performance question that actually matters is concurrency under I/O wait. Both FastAPI's asyncio and Node.js's event loop handle this well. When 100 concurrent users each trigger an LLM call that takes 3 seconds, both frameworks queue and process them without blocking — neither is materially better here. Where FastAPI can genuinely outperform Express is CPU-bound operations: if you're running model inference in-process, Python with NumPy and C extensions (like those in FAISS or sentence-transformers) is faster than equivalent pure JavaScript. But this advantage only applies to local model inference, not API calls.

The Real Differentiator: LLM Integration Complexity

Calling the Claude or OpenAI API in both languages takes roughly 10 lines and works equivalently well. The gap opens when you move beyond API calls.

Suppose you're building a document analysis tool. A user uploads a PDF of a Kerala government procurement tender. Your pipeline needs to extract text, split it into meaningful chunks (not just by character count — by semantic boundaries), generate embeddings, store them in a vector database, then answer questions about the document. In Python, every step has a mature, production-tested library: PyMuPDF for PDF extraction, LangChain's text splitters for semantic chunking, sentence-transformers for embeddings, and pgvector with SQLAlchemy for the vector store.

Replicating this pipeline in Node.js requires either calling a Python microservice for the embedding generation, using JavaScript ports that may have different behaviour or missing features, or accepting that you'll spend significant time on plumbing that Python developers solve with two import statements. The JavaScript AI ecosystem has caught up substantially since 2023, but for document processing pipelines with local models, Python's ecosystem remains ahead.

For a startup building a pure LLM wrapper — a chatbot, a content generation tool, a document Q&A system using only external API calls with no local model inference — this gap is irrelevant. LangChain.js and LlamaIndex.TS are functional for these use cases.

Architecture for a Real AI App

The most pragmatic architecture for many Indian SaaS products combines both: Express for product features and FastAPI for AI inference. This isn't over-engineering — it's separation of concerns aligned with each language's strengths.

Express handles authentication (JWT issuance and validation), user management, subscription billing via Razorpay, file uploads to S3 or Supabase Storage, and webhook handling. These features have excellent Node.js SDKs, and a JavaScript developer can implement them quickly without context-switching to Python.

FastAPI handles document processing, embedding generation, vector search, LLM orchestration with LangChain, and any endpoints that touch local model inference. These are Python-native operations.

Your Next.js frontend calls Express for auth and CRUD operations, and calls FastAPI for AI-powered features. The two backends share a Postgres database and communicate via internal HTTP when needed — for example, Express calls FastAPI's /process-document endpoint after a file upload completes.

This architecture makes sense when your team includes both JavaScript and Python developers, or when you're willing to learn both. It adds operational complexity — two servers to deploy, two codebases to maintain, two CI pipelines. If you're a solo developer or a two-person team, a single FastAPI backend serving everything (including auth and billing, using Python's httpx for Razorpay API calls) is simpler to operate than two separate services.

Deployment Cost on Indian Cloud Infrastructure

Infrastructure choices depend on your team's capabilities and budget. Here are realistic monthly costs for an early-stage Indian AI startup on AWS ap-south-1 (Mumbai):

FastAPI inference server: EC2 t3.medium (2 vCPU, 4 GB RAM) costs approximately ₹3,200/month on-demand. If you run local embedding models, a t3.large (2 vCPU, 8 GB RAM) at ₹6,400/month is safer for memory headroom. For GPU inference, a g4dn.xlarge with a T4 GPU runs roughly ₹22,000/month — rarely necessary if you're using Hugging Face Inference Endpoints or external embedding APIs instead of local GPU inference.

Express product backend: Railway's Starter plan at approximately ₹800/month covers a Node.js service with 8 GB RAM and always-on deployment. For a pre-revenue startup, Railway's free tier at $5 of credit per month is sufficient. Render's paid tier at $7/month (₹580) is another option.

Database: Supabase on the Pro plan at $25/month (approximately ₹2,100) gives you Postgres with the pgvector extension for vector storage, auth, and storage — eliminating the need for separate vector database infrastructure at early stage. Neon Postgres at $19/month (₹1,600) is an alternative with serverless branching useful for staging environments.

Practical stack total for an Indian AI startup at early stage: ₹6,500–₹12,000/month, before LLM API costs, which scale with usage.

Code Comparison: A RAG Endpoint in Both

The same endpoint — accept a question, query a vector database, return an LLM response — in both frameworks:

FastAPI (Python):

from fastapi import FastAPI
from pydantic import BaseModel
from anthropic import AsyncAnthropic
import asyncpg

app = FastAPI()
client = AsyncAnthropic()

class QuestionRequest(BaseModel):
    question: str
    collection_id: str

@app.post("/ask")
async def ask(req: QuestionRequest):
    # Query pgvector for relevant chunks
    conn = await asyncpg.connect(DATABASE_URL)
    embedding_resp = await client.embeddings.create(
        model="voyage-3", input=[req.question]
    )
    query_vector = embedding_resp.embeddings[0].embedding

    rows = await conn.fetch(
        """SELECT content FROM chunks
           WHERE collection_id = $1
           ORDER BY embedding <-> $2::vector LIMIT 5""",
        req.collection_id, query_vector
    )
    await conn.close()

    context = "\n\n".join(r["content"] for r in rows)
    message = await client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=[{"role": "user", "content": f"Context:\n{context}\n\nQuestion: {req.question}"}]
    )
    return {"answer": message.content[0].text}

Express + Vercel AI SDK (TypeScript):

import express from 'express';
import { generateText } from 'ai';
import { anthropic } from '@ai-sdk/anthropic';
import { createClient } from '@supabase/supabase-js';

const router = express.Router();
const supabase = createClient(SUPABASE_URL, SUPABASE_SERVICE_KEY);

router.post('/ask', async (req, res) => {
  const { question, collectionId } = req.body;

  // Embed the question via an external API (e.g., Voyage AI)
  const embeddingRes = await fetch('https://api.voyageai.com/v1/embeddings', {
    method: 'POST',
    headers: { Authorization: `Bearer ${VOYAGE_API_KEY}`, 'Content-Type': 'application/json' },
    body: JSON.stringify({ model: 'voyage-3', input: [question] }),
  });
  const { data } = await embeddingRes.json();
  const queryVector = data[0].embedding;

  // Query Supabase pgvector via RPC
  const { data: chunks } = await supabase.rpc('match_chunks', {
    query_embedding: queryVector,
    match_count: 5,
    collection_id: collectionId,
  });

  const context = chunks.map((c: any) => c.content).join('\n\n');
  const { text } = await generateText({
    model: anthropic('claude-sonnet-4-6'),
    prompt: `Context:\n${context}\n\nQuestion: ${question}`,
  });

  res.json({ answer: text });
});

Both implementations are roughly equivalent in length. The Express version makes an extra HTTP call to Voyage AI for embeddings — adding latency and a potential failure point. The FastAPI version could run sentence-transformers locally, eliminating that API call entirely. As your RAG pipeline grows more sophisticated — adding reranking, query expansion, or hybrid BM25 + vector search — the Python version pulls ahead in available tooling without additional service calls.

Frequently Asked Questions

If I'm a JavaScript developer, should I learn Python just to use FastAPI?

If your AI app is primarily making LLM API calls — sending prompts to Claude, GPT-4o, or Gemini and returning responses — then JavaScript is completely adequate. The Vercel AI SDK, LangChain.js, and the official Anthropic and OpenAI Node.js SDKs cover this use case well. But if your app involves local model inference, pandas data preprocessing, scikit-learn classification, or working with FAISS for vector search, then yes — learning enough Python to write FastAPI endpoints is worth the investment. The Python AI ecosystem isn't just larger; it's where the tooling gets built first. JavaScript ports often lag by months and sometimes lack GPU acceleration.

Can FastAPI handle 10,000 concurrent users?

Yes. FastAPI runs on uvicorn, an ASGI server that handles concurrency through Python's asyncio event loop. For 10,000 concurrent users making LLM API calls, you'd run uvicorn behind gunicorn with multiple async workers — typically (2 × CPU cores) + 1 workers. On a 4-core machine, that's 9 workers. Async endpoints are essential here: async def routes release the event loop while awaiting LLM responses, allowing other requests to proceed without blocking. On AWS ap-south-1, a c6i.xlarge (4 vCPU, 8 GB RAM) at roughly ₹6,500/month handles substantial production traffic. Monitor event loop lag with a Prometheus metric rather than just CPU — a blocked event loop is invisible to CPU usage metrics but causes latency spikes.

What's the best ORM for FastAPI when building an Indian SaaS?

SQLModel is the most ergonomic starting point for FastAPI — created by the same author and sharing Pydantic models for both database schemas and API validation, which eliminates defining the same data structure twice. For more complex queries and full SQL control, SQLAlchemy 2.0 with asyncio support and the asyncpg driver for PostgreSQL is the production standard. For migrations, Alembic works with both and generates migration scripts from model changes safely. For an Indian SaaS handling GST invoicing, multi-tenancy across states, or complex reporting across financial periods, SQLAlchemy's flexibility with raw SQL through text() is worth the additional setup over SQLModel's simpler abstractions.

Rajesh R Nair

IT Consultant based in Trivandrum with 12+ years helping Indian businesses with technology, SEO, and digital delivery. Learn more →