Harshit Singh

Jan 15, 2025 • 6 min read

BookMyShow Architecture and Why it failed during Coldplay

"Where the Sky was not full of stars but waiting queues" ✨

BookMyShow Architecture and Why it failed during Coldplay

We all love Coldplay and when they announced an India Tour, we all wanted to attend but some of us were not that lucky to grab tickets and what exactly happened when tickets went live and why all their apps and website started to logout? BookMyShow, a leading ticketing platform in India, faced significant issues during Coldplay’s concert ticket sales. The platform crashed under the weight of millions of users trying to purchase tickets simultaneously. This article explains what exactly happened and how BookMyShow’s system works from a system design perspective, what went wrong during the Coldplay ticket sales, and how it can be improved in the future.

BookMyShow System Design: From User Click to Booking Confirmation

  1. User Interaction (Frontend):
    When a user visits BookMyShow, they interact with the platform through the website or mobile app. They search for an event, select their preferred seats, and proceed to checkout.

  2. Request to Backend (API Layer):
    Once the user clicks "Purchase," the frontend sends an API request to the backend. This includes event details, seat preferences, and user authentication.

  3. Load Balancer:
    The backend consists of multiple servers to distribute the load efficiently. A load balancer receives the incoming traffic and routes it to different application servers to ensure no single server is overwhelmed.

  4. Database Interaction (Backend):
    The request is then passed to the backend database, which manages real-time event data like seat availability, pricing, and booking status. The database must handle multiple read and write operations simultaneously (checking seat availability, reserving seats, updating availability).

  5. Caching Layer:
    To reduce load on the database, BookMyShow uses a caching layer. Frequently accessed data, like seat availability, is stored in memory (using Redis or Memcached) to speed up response times. Caching helps ensure users aren’t repeatedly querying the database for the same data.

  6. Session Management:
    During the booking process, session management ensures that user actions, like seat selection and payment details, are tracked. This ensures users don’t lose their selections while navigating through the checkout process.

  7. Payment Gateway Integration:
    Once the user confirms their seat and proceeds to payment, the system integrates with a third-party payment gateway (like Razorpay or Paytm) to process the transaction.

  8. Booking Confirmation:
    After the payment is successful, the backend updates the database with the user’s booking details, and the system confirms the booking to the user. This final response is sent back to the frontend.

Challenges and Failures During the Coldplay Sale

1. Traffic Overload:

What Happened? Coldplay’s ticket sales attracted millions of fans, causing BookMyShow’s servers to experience an unexpected surge in traffic. With 13 million users trying to access the platform at the same time, the infrastructure wasn’t able to scale quickly enough to handle the influx, leading to slowdowns and eventual crashes.

Why Did This Happen? While BookMyShow uses horizontal scaling (adding more servers as needed), this surge might have been beyond the platform's scaling capabilities. Even if the servers were added, the load balancer, which is responsible for distributing traffic evenly, may not have been optimized to handle such a rapid influx.

Possible Solutions:

  • Dynamic Scaling: Implementing auto-scaling can allow BookMyShow to dynamically adjust the number of servers in response to real-time traffic spikes.

  • Load Balancer Optimization: Ensuring that the load balancer can distribute traffic efficiently across all available servers, particularly during sudden surges, would help prevent overwhelming any single server.

2. Load Testing: Were there servers ready for this? 🧪

What Happened? Load testing is a critical part of any system’s preparation, especially for platforms that deal with high traffic. It's likely that BookMyShow didn’t simulate traffic spikes accurately enough to prepare for this massive event. Even if stress tests were conducted, they might not have replicated the real-world conditions that occurred when Coldplay tickets went live.

Why Did This Happen? While BookMyShow likely tested its system under normal or moderately high traffic, extreme traffic conditions (13 million simultaneous users) might have been outside the scope of their testing.

Possible Solutions:

  • Stress Testing: BookMyShow should perform stress tests where the system is pushed to its limits. This would expose vulnerabilities in the architecture and help identify potential points of failure.

  • Spike Testing: Running simulations of sudden traffic surges would help the platform anticipate how it handles extreme, short-lived traffic bursts during high-demand events.

3. Database Bottlenecks: Was the Backend Overwhelmed? 🗄️

What Happened?
The backend database likely struggled with millions of concurrent read/write operations, especially with users checking seat availability and making payments.

Solution:

  • Database Replication: Spread the database load across multiple servers to handle more requests concurrently.

  • Caching: Improve caching strategies to reduce the load on databases during high-demand periods by storing frequently queried data in memory.

4. Caching Issues: Why Is Caching Critical? ⚡

What Happened? Caching helps reduce the load on databases by storing frequently accessed data, like available seats for an event. If caching wasn’t implemented effectively, or if cache data expired too early, the platform would have had to hit the database repeatedly, adding additional strain during high-demand periods.

Why Did This Happen? BookMyShow might have either failed to implement sufficient caching or had a caching strategy that wasn’t well-suited for this event. If the data was stale or expired, the platform would have had to re-fetch the same data repeatedly, which slowed down the entire system.

Possible Solutions:

  • Distributed Caching: Tools like Redis or Memcached can be used to cache frequently queried data, reducing database load and speeding up response times.

  • Cache Refreshing: Ensuring that cached data is consistently updated and valid during high-demand events would improve performance and prevent outdated information from being served to users.

5. Queue Management: Could a Better System Prevent This? 📋

What Happened? BookMyShow uses a queue system to manage traffic during high-demand events, ensuring that users enter the platform in an orderly manner. However, with millions of people trying to buy tickets at once, the queue system may not have been able to handle such a massive influx, contributing to delays and crashes. People were made to wait in a queue until their number got accessed by the system and while waiting they couldn't reload or exit the tab.

Why Did This Happen? The queue management system might not have been scaled adequately for such large-scale events, or it could have been poorly configured to manage the sudden, intense demand.

Possible Solutions:

  • Dynamic Queue Management: A more robust queue system that can scale based on the number of users in line would prevent the platform from becoming overwhelmed.

  • Fair Access: Prioritizing users who are already in the queue can help ensure that everyone gets a fair shot at booking tickets without delays or crashes.

Conclusion

The Coldplay ticket sale highlighted the vulnerabilities in BookMyShow’s architecture when faced with extreme traffic spikes. While the platform is generally well-designed for normal operations, issues with scaling, database performance, caching, and queue management led to the crash. By enhancing load balancing, conducting more realistic stress testing, improving database scalability, and refining caching and queue management systems, BookMyShow can avoid similar issues in the future and ensure smoother experiences for users during high-demand events.

Memes:


Join Harshit on Peerlist!

Join amazing folks like Harshit and thousands of other people in tech.

Create Profile

Join with Harshit’s personal invite link.

0

10

0