
Notifications are everywhere — from a welcome email when you sign up, to in-app messages reminding you about updates, to push notifications on your phone. Designing a reliable notification system is a critical part of any scalable application.
In this blog, we’ll break down the system design of a notification system using a simple diagram. By the end, you’ll understand how notifications are triggered, processed, and delivered efficiently.
Notifications improve user engagement and keep users informed about important events. For example:
A welcome email after signup.
A password reset email when you forget your password.
A push notification when a new friend request arrives.
An in-app notification when someone likes your post.
Instead of hardcoding notifications inside each feature, we design a scalable notification system that can handle millions of events without delays.
Here’s the high-level view of our notification system (based on the diagram you provided):
Users interact with the application (login, signup, posting content).
Servers (behind a Load Balancer) capture these events.
Events are sent to Simple Notification Service (SNS).
SNS routes events to different queues (Email Queue, In-App Queue, Push Notification Queue).
Workers (EC2 instances) process these queues and call the right services:
Email Service (SES or third-party API)
Database for storing in-app notifications
Firebase/Push service for mobile notifications
Notifications are delivered to users in real time or near real time.

Multiple users trigger different events: login, signup, posting, friend requests.
A Load Balancer distributes requests across servers to avoid overloading one machine.
Servers capture user actions (e.g., "User signed up").
Instead of sending emails directly, servers publish events to the SNS topic.
Example:
On signup → Send "Welcome Email" event.
On login → Send "In-App Notification" event.
SNS acts as the event distributor.
It takes in events and routes them to the correct queues (Email Queue, In-App Queue, Push Queue).
This design is event-driven, meaning servers don’t directly handle notifications — they just publish events.
Each type of notification has its own queue:
Email Queue → Handles email notifications.
In-App Queue → Stores notifications for in-app delivery.
Push Notification Queue → Sends push notifications to mobile devices.
Queues help decouple services and ensure messages are not lost.
Workers continuously read from the queues.
They process messages and decide the right action:
Email Worker → Calls SES/Email API.
Push Worker → Calls Firebase/Push API.
In-App Worker → Stores notification in the database.
👉 Extra Responsibilities of Workers:
Rate Limiting: Workers can enforce limits to avoid overwhelming email providers (e.g., max X emails per second).
Retries and DLQ (Dead Letter Queue):
If an email fails (e.g., provider down), workers retry a few times (max retry count).
If it still fails, the event is moved to a Dead Letter Queue for later investigation.
User Preferences Check:
Before sending, workers check if the user has opted in for emails/notifications.
Example: If the user disabled push notifications, the event is ignored instead of being sent.
Let’s walk through a Signup Event:
User signs up → Event sent to server.
Server publishes "Signup Event" to SNS.
SNS routes it to the Email Queue.
An Email Worker picks up the event.
Worker checks user preferences → (Has the user disabled emails?)
If allowed → Call SES/Email Service.
If not allowed → Drop the event.
If SES fails → Retry up to 3 times.
If still failing → Move to Dead Letter Queue.
User receives a Welcome Email 🎉.

Scalability → Each component (servers, workers, queues) can scale independently.
Reliability → Even if the email service is down, events stay in the queue until retried.
Fault Tolerance with DLQ → Failed events are preserved for debugging instead of being lost.
Flexibility → Easy to add new notification channels (like SMS) by adding a new queue + worker.
Personalization → User preferences ensure notifications are respectful and relevant.
Rate Control → Workers prevent spamming or breaching provider rate limits.
Queue Delays → If workers can’t keep up, notifications may be delayed.
Complex Retry Logic → Designing retry + DLQ correctly adds extra complexity.
User Preference Storage → Requires fast database lookups to avoid slowing down processing.
Cost → Using third-party services (SES, Firebase) adds to infrastructure costs.
More Moving Parts → Monitoring, logging, and debugging across SNS, queues, and workers can get tricky.
Facebook/Instagram → In-app + push notifications for likes, comments, friend requests.
E-commerce Platforms → Emails for order confirmations, shipping updates.
Banking Apps → Push notifications for transactions and fraud alerts.
SaaS Apps → Respecting user preferences (unsubscribe, email frequency) is mandatory.
A notification system ensures reliable delivery of messages across multiple channels.
Using an event-driven architecture (SNS + queues) makes the system scalable and fault-tolerant.
Features like retry policies, DLQs, rate limiting, and user preferences make the system production-ready.
Adding more channels (like SMS or WhatsApp) is simple — just plug in a new queue + worker.
Q1. Why not send notifications directly from the server?
Because it tightly couples app logic with notification delivery, making it harder to scale or recover from failures.
Q2. What if the email service is down?
Workers retry a few times. If retries fail, the event is pushed to a Dead Letter Queue (DLQ) for debugging and reprocessing later.
Q3. How do we prevent spamming users with too many notifications?
Workers enforce rate limiting and check user preferences before sending.
Q4. Can this system handle millions of notifications?
Yes — by scaling queues and workers horizontally.
Q5. How do we add SMS or WhatsApp notifications?
Just add a new queue + worker and connect it to an SMS/WhatsApp provider API.
👉 That’s the system design of a notification system — explained in simple terms with fault tolerance, retries, and user preferences.
9
17
3