Cheerful diverse couple smiling and sharing smartphone — Photo by Anonymous

Scaling Beyond the Loading Spinner: How We Re-Engineered Our PDF Pipeline

Hey everyone, let me tell you about something that was driving us crazy at work. We built this platform where customers can get machine quotations. Pretty simple idea: they pick a machine, we calculate the price, make a nice PDF, and send it to them through email and WhatsApp.

At first, when we had just a handful of users, everything worked fine. Someone clicks "Send Quotation" and after a few seconds, they get their document. No big deal. But then usage started picking up. What used to take 5 seconds turned into 12-15 seconds of waiting. Users staring at loading screens. Our servers sweating under pressure. And if WhatsApp was having a bad day or storage was slow, the whole thing would just fail.

We knew we had to fix this, but not by making PDF generation 2x faster. That wouldn't solve the real problem. We needed to completely change how we handled it. Here's the full story of what broke, what we built instead, and all the steps we took to make it work.

The Problem Got Real

Let me paint the picture of what our old system looked like. Everything happened in one single API endpoint in our Next.js app. It was a "Synchronous Monolith" trapped inside a serverless function.

Here's the step-by-step flow that was killing us:

User fills out the quotation form and hits "Send."
Our API route pulls machine specs, images, and pricing rules from the DB.
We calculate totals, taxes, and discounts on the fly.
The Heavy Lift: We launch Puppeteer (a headless browser) to "print" the PDF.
The Network Wait: We upload the file to Azure Blob Storage.
The Third-Party Risk: We hit the Email API and then the WhatsApp API.
Finally, after all that, we send back "Success!"

Sounds logical, right? But here's what actually happened:

Users hated the wait. 15 seconds is an eternity on the web. On mobile, users would assume the app was broken, hit refresh, and start the entire process again, effectively DDOS-ing our own server.
Servers were dying. PDF generation is CPU-bound. Launching a browser (Puppeteer) for every request meant our API servers would spike to 90%+ CPU usage instantly.
One failure killed everything. If the WhatsApp API was down for 30 seconds, the whole request would time out. The user would get an error even if the PDF was already generated. It was all or nothing.

The Fix: Don't Make Users Wait

The answer was simple but powerful: Stop doing heavy work in the API response. Inspired by how teams like Zerodha handle massive bursts of traffic, we moved to an Asynchronous Task Queue model.

The API's only job now is to validate the data and drop a "Job" into a Redis database. Think of it like a restaurant: the waiter (API) takes your order and hands a slip to the kitchen. The waiter doesn't stand there and cook your food; they immediately go to the next table.

How We Actually Built It (Step by Step)

Step 1: Making the API "Instant"

We gutted our endpoint. Now, it fetches necessary data in parallel using Promise.all and immediately hands the work over to BullMQ.

// app/api/send-quotation/route.ts
const job = await quotationQueue.add(
  'GENERATE_AND_STORE_PDF',
  {
    type: 'GENERATE_AND_STORE_PDF',
    data: pdfData,
    machineModel: pdfData.machine.model,
  },
  {
    attempts: 3,
    backoff: { type: 'exponential', delay: 1000 },
  },
)
 
return NextResponse.json({ success: true, jobId: job.id })

Step 2: Swapping the Engine (Puppeteer to pdf-lib)

This was a major turning point. Like Zerodha realized that headless Chrome was overkill for PDFs, we moved away from Puppeteer.

The Problem with Puppeteer: It launches a full Chromium browser. It uses ~200MB of RAM per PDF. It’s slow to start and hard to scale.
The Solution (pdf-lib): We switched to pdf-lib, a pure JS library. Instead of rendering HTML, we programmatically draw the document. It’s lightning-fast, uses 80% less memory, and doesn't require a browser to run.

Step 3: Reliable Workers & Chained Jobs

We built a background worker that listens to the Redis queue. We used a "Chained Job" strategy to ensure the system is unbreakable.

PDF Worker: Generates the file and uploads it to Azure.
Notification Pipeline: Once the PDF is safe in Azure, the worker adds two new jobs to the queue: SEND_EMAIL and SEND_WHATSAPP.

// lib/queue.ts
case 'GENERATE_AND_STORE_PDF': {
  const pdfBuffer = await generatePDF(pdfData);
  const pdfUrl = await uploadPDF(pdfBuffer, fileName);
 
  // Chain the next steps as independent jobs
  await Promise.all([
    quotationQueue.add('SEND_EMAIL', { data: { ...pdfData, pdfUrl } }),
    quotationQueue.add('SEND_WHATSAPP', { data: { ...whatsappPayload } })
  ]);
 
  return { pdfUrl };
}

Why this matters: If the WhatsApp API has a hiccup, BullMQ will automatically retry only the WhatsApp job using an "exponential backoff" (waiting 1s, then 2s, then 4s). The user already has their confirmation, and the PDF is already stored. The system heals itself.

What Changed (Real Numbers)

Metric	Before (Monolith)	After (Queue + pdf-lib)
API Response Time	10 - 15 seconds	150 - 250 milliseconds
Server CPU Usage	90%+ spikes	Stable 25-35%
Max Concurrent Quotes	~15	150+
Delivery Success Rate	82%	99.8%

The Big Lessons

APIs should coordinate, not execute: Keep your API routes under 500ms. Move the "heavy lifting" to the background.
Separate your concerns: Your PDF generation shouldn't care if your WhatsApp API is working. By breaking these into separate jobs, you make the whole system "Fault Tolerant."
Choose the right tool: Puppeteer is great for complex web scraping, but for generating structured data documents, a lightweight library like pdf-lib is a much better engineering choice.

We went from a system that was brittle and slow to one that feels instant and unbreakable. If you're struggling with slow features, stop trying to make the code faster and start looking at your architecture. It’s a total game-changer.

Kumar Avishek