, , , ,

Aug 8, 2025 | 4 Minute Read

What Happens When 1 Million Requests Hit Your APIs: A Real-World Guide To Performance Benchmarking

S. M. Tashrik Anam, Senior Software Engineer

Table of Contents

Introduction

You never forget the first time your platform buckles under real traffic. Maybe it was during a product launch, a big event, or just an unexpected surge. Suddenly, response times spike, error logs fill up, and dashboards light up like a warning signal. That’s when most teams realize: scale isn’t about hope, it’s about preparation.

In a world where cloud platforms are expected to be fast, always-on, and infinitely scalable, performance isn’t a nice-to-have: it’s survival. This is the story of how the Axelerant team engineered their way through 1 million+ API requests using real-world user simulation, observability-first thinking, and multi-protocol testing. 

Why Performance Benchmarking Is A Strategic Necessity

Scaling platforms, particularly those delivering live data experiences, can't afford uncertainty. Whether you're supporting flash sales, high-volume marketplaces, or real-time dashboards, your system’s behavior under stress defines user trust. Performance benchmarking enables:

  • Predictable Scaling: Identify bottlenecks before they cause real-world outages.
  • Data-Driven Architecture Decisions: Validate auto-scaling, pooling strategies, and microservice performance.
  • Early Detection Of Technical debt: Highlight areas needing refactoring or enhanced observability.

Platforms that scale without testing invite downtime. Performance benchmarking is the engineering equivalent of a financial stress test.

Designing A Realistic Load Testing Framework

Instead of defaulting to traditional browser-based testing, the engineering team adopted an API-first, protocol-aware strategy for simulating user workflows across both B2B (operators) and B2C (users) layers.

Why Browser Testing Wasn’t Enough:

  • Simulating thousands of browsers would require massive infrastructure and orchestration.
  • Real user behavior wasn't about clicks, it was about backend calls and real-time data exchanges.

Testing Stack:

  • Load Generation: Grafana k6 (chosen for scripting flexibility, multithreading, and cloud readiness)
  • Real-Time Validation: WebSocket protocol support
  • Monitoring & Observability: AWS CloudWatch, Last9 (Grafana dashboards, Prometheus, OpenTelemetry)
  • Infrastructure: Kubernetes cluster (auto-scaling enabled, monitored under load)

Designing a Realistic Load Testing Framework

Dual Protocol Testing:

  • HTTP APIs were tested for user login, market interaction, and transaction actions.
  • WebSocket streams simulated sub-second updates from B2B control panels to B2C user interfaces.

Simulating Real-World Behavior At Scale

The test design wasn't just synthetic concurrency; it was a behavioral simulation.

B2B Workflow Simulation:

  • Multiple unique operator personas, each controlling multiple dynamic data segments.
  • Simulated multi-screen management of real-time data with rapid updates.

B2C User Simulation:

  • Thousands of user threads performing concurrent interactions.
  • Threads synchronized to simulate coordinated traffic peaks (e.g., match starts).

Polling & Timing:

  • High-frequency polling endpoints triggered multiple times a second per user.
  • Timing logic orchestrated thread behavior to ensure realism (e.g., trigger betting thread only after market open update).

This level of nuance ensured that performance issues surfaced under authentic patterns, not just theoretical spikes.

Iterative Load Testing: Building Confidence Through Progressive Validation

The load testing was executed in four planned iterations, each escalating in intensity.

 

Iteration

Duration

Requests Processed

Error Rate

Peak TPS

#1

15 min

268,000

0.012%

298 TPS

#2

15 min

391,000

0.41%

434 TPS

#3

20 min

207,000

0.92%

172 TPS

#4

25 min

954,000

0.46%

636 TPS

Key Observations:

  • Error Spike At Iteration 3: A turning point, highlighting database and service-layer stress.
  • Endpoint Degradation: Specific POST and polling endpoints slowed down (P99 > 3s).
  • Auto-Scaling Validation: Kubernetes scaled services up to 20 replicas without service disruption.

Key observations from load testing iterations

The iteration model helped engineering teams focus efforts on specific problem areas and validate fixes incrementally.

Deep Observability: From Metrics To Meaning

What made this load test stand apart was the layered observability stack and real-time diagnostics.

Infrastructure Layer:

  • RDS (DB): Monitored CPU (rose from 15% to 42%), connection spikes (up to 4,480), and query latencies.
  • ElasticCache: Gradual CPU rise with zero evictions, cache strategy validated.
  • Kubernetes: Pod CPU/memory tracked for services like user auth, data streams, and control APIs.

Application Layer:

  • APM Traces: Surfaced bottlenecks in business logic for transaction submission and odds synchronization.

APM Traces

  • Slowest APIs: Identified and mapped to retry logic and exception handling blocks.

Real-Time Alerting:

  • Combined dashboards triggered alerts when latency crossed P95 thresholds.
  • Enabled response strategies to kick in automatically or via DevOps intervention.

Testing Real-Time Messaging: Why WebSocket Matters

Modern digital platforms cannot afford latency. Real-time messaging, especially in mission-critical data flows, requires sub-second consistency.

Why WebSocket Testing Was Critical:

  • Regular HTTP polling introduced delays and bandwidth overhead.
  • WebSocket enabled instant propagation of backend changes to UI.

Implementation:

  • WebSocket endpoints were identified based on discussion with the frontend and product team.
  • Load scripts simulated thousands of persistent connections with constant message flow.
  • WebSocket response times and message integrity were validated under peak loads.

Outcome:

  • WebSocket infrastructure held steady under full load.
  • Verified sub-second sync from backend changes to frontend display.

This demonstrated not just performance, but platform maturity.

What Engineering Leaders Should Take Away

Whether you're a CTO, engineering manager, or hands-on architect, performance benchmarking is no longer optional; it's foundational. Here’s what this engagement revealed that applies far beyond one cloud platform:

  1. Design for Realism, Not Just Load: It's tempting to hammer your homepage with traffic and call it a load test. But scale emerges in workflows, not endpoints. Simulate actual user journeys and patterns.
  2. Account for Real-Time Protocols: If your system has any real-time component, notifications, dashboards, or feeds, testing WebSocket performance is non-negotiable.
  3. Layer Your Observability: It’s not just about collecting logs or watching one dashboard. Combine infra metrics, APM traces, and live alerting to see the whole picture, especially when things go wrong.
  4. Make Iteration the Default: One test isn't enough. Build repeatable load test cycles into your engineering pipeline, and make space for teams to act on the findings.
  5. Scale Predictably, Not Theoretically: Kubernetes can scale. But will it scale your architecture effectively? Auto-scaling only works when you’ve accounted for connection pooling, warm startups, and service thresholds.
  6. Cross-Team Ownership is Key: Performance isn't just DevOps’ responsibility. QA, product, backend, and SREs must plan, simulate, and review performance as a shared responsibility.
  7. Remember the Human Cost of Downtime: Every second of delay is a lost conversion, a frustrated user, or a broken SLA. It's not just about requests per second, it's about people on the other side.

This isn’t just a checklist; it’s a shift in how modern teams build resilient, experience-first platforms.

Engineering The Future At Scale


There’s a moment in every high-growth platform where the technology hits a wall. The systems slow down. Users complain. Dashboards start showing numbers that used to belong only in hypothetical design docs. That’s not failure, that’s the edge of scale. And it’s where great engineering begins.

This story isn’t just about numbers. It’s about the rigor, the teamwork, and the foresight it takes to build systems that don’t just function under load but thrive under it. From designing load simulations with surgical precision to observing WebSocket streams in real-time, performance became a shared discipline.

At Axelerant, this performance benchmarking journey wasn’t a project. It was a culture in action, of engineering excellence, of asking the harder questions, and of designing for growth before growth hits.

So ask yourself: when the next million requests hit your APIs, will you be ready? Or will you be reactive?

 

About the Author
Bassam Ismail, Director of Digital Engineering

Bassam Ismail, Director of Digital Engineering

Away from work, he likes cooking with his wife, reading comic strips, or playing around with programming languages for fun.


Tashrik Anam

S. M. Tashrik Anam, Senior Software Engineer

Tashrik enjoys working on front-end development, particularly in enhancing performance and code quality in frameworks like Next.js and React. He is passionate about learning new technologies and sharing insights through mentoring and writing. Outside of coding, he finds balance through coffee breaks and refreshing walks.

Leave us a comment

Back to Top