Building an ML-Powered Notification Router on AWS: A Production Architecture Guide

Introduction
The Problem: Notification Fatigue
Solution Architecture
Core Components Deep Dive
ML Pipeline: From Raw Events to Predictions
Implementation Details
Critical Lessons Learned
Performance & Cost Analysis
What's Next
Conclusion

Introduction

Sending notifications at the wrong time is like knocking on someone's door at 3 AM. Even if you have something important to say, timing matters. In this article, I'll walk you through building a production-grade, ML-powered notification routing engine that predicts the optimal send time for each user based on their historical engagement patterns.

What we'll build:

Real-time ML inference system using Amazon SageMaker
Event-driven architecture processing millions of events
Automated ML training pipeline with feedback loops
Infrastructure as Code using AWS CDK
Cost-optimized serverless design (~$350/month for 1M+ events/day)

Tech Stack

AWS Lambda (Java 21)

Amazon SageMaker

AWS Glue (PySpark)

Amazon Kinesis

DynamoDB

EventBridge Scheduler

AWS CDK (TypeScript)

XGBoost

GitHub Repository

Full source code available at: github.com/Yadab-Sd/smart-notification-routing-engine

The Problem: Notification Fatigue

The Business Challenge

Modern applications send billions of notifications daily. Email, SMS, push notifications, WhatsApp messages—the channels are endless. But here's the problem:

50-70%

of notifications go unread when sent at non-optimal times

30%

increase in user churn due to notification fatigue

5-10x

variance in engagement rates depending on send time

Traditional Approaches (and Why They Fail)

❌

"Send at 9 AM local time"

Ignores individual user behavior patterns and preferences

❌

"Send when user was last active"

Past behavior doesn't predict future engagement

❌

"Batch and send at fixed intervals"

Misses optimal windows entirely

The Technical Challenge

Building a smart notification router requires solving several problems:

1
Real-time prediction: Decision latency must be <500ms
2
Personalization: Each user has unique engagement patterns
3
Scale: Handle millions of events per day
4
Feedback loops: Model must improve with delivery outcomes
5
Cost optimization: Keep AWS costs under $500/month for 1M+ events

Solution Architecture

High-Level Design

The system consists of three main flows:

📥

Event Ingestion Flow

(real-time)

🎯

Decision & Scheduling Flow

(real-time)

🤖

ML Training Pipeline

(batch, daily)

Complete System Architecture

Architecture Principles

Event-Driven & Decoupled

Amazon Kinesis for event streaming
Services communicate via events, not direct calls
Enables independent scaling and deployment

Serverless-First

Lambda for compute (auto-scaling, pay-per-use)
DynamoDB for state (millisecond latency)
No servers to manage

ML Feedback Loop

Delivery outcomes feed back into training
Model retrains daily on fresh data
Continuous improvement without manual intervention

Core Components Deep Dive

1. Event Ingestion: Control Plane Lambda

Purpose: Ingest user events (page views, clicks, notifications sent) and stream to Kinesis.

// services/control-plane/src/main/java/Handler.java
public class Handler implements RequestHandler<APIGatewayV2HTTPEvent,
                                               APIGatewayV2HTTPResponse> {
    private final KinesisClient kinesis;

    @Override
    public APIGatewayV2HTTPResponse handleRequest(...) {
        ObjectMapper json = new ObjectMapper();
        UserEvent userEvent = json.readValue(
            event.getBody(), UserEvent.class
        );

        // Validate and stream to Kinesis
        PutRecordRequest putReq = PutRecordRequest.builder()
            .streamName(streamName)
            .partitionKey(userEvent.getUserId())
            .body(SdkBytes.fromUtf8String(
                json.writeValueAsString(userEvent)
            ))
            .build();

        kinesis.putRecord(putReq);
        return buildResponse(200, "{\"status\":\"accepted\"}");
    }
}

Key Design Decisions:

✓
Java 21 with SnapStart: Cold starts reduced from ~2s to ~200ms
✓
Partition by userId: Ensures ordered processing per user
✓
Async processing: API responds immediately

ML Pipeline: From Raw Events to Predictions

Pipeline Architecture

S3 Raw Events→Glue ETL→S3 Curated Features→SageMaker Training→SageMaker Endpoint

The ML pipeline transforms raw JSONL events into predictions through automated feature engineering, XGBoost training, and real-time inference endpoints.

Critical Lessons Learned: 5 Bugs You Must Avoid

✍️

📝 Content Under Development

This section is actively being written and refined. I'm documenting the critical production bugs we encountered and the lessons learned from building this system at scale.

🚀 Check back soon for the complete breakdown of all 5 bugs with detailed explanations, code examples, and prevention strategies!

🐛

Bug #1: Feature Mismatch Between Training and Inference

Training used features [sends_count_hour, click_rate_7d] but inference sent [hour, dow, days_since_last_seen]

📉 The Result:

AUC score plummeted from 0.82 (validation) to 0.51 (production)

💡 Lesson: Always version your feature schemas and validate at deployment time.

🐛

Bug #2: Format Mismatch (CSV vs Parquet)

Glue output CSV but training expected Parquet

💡 Lesson: Use SageMaker built-in algorithms—they handle CSV natively.

🐛

Bug #3: Wrong Event Types in ETL

Filtered for PLAY_MOVIE instead of NOTIFICATION_SENT

💡 Lesson: Validate data pipelines with unit tests.

Performance & Cost Analysis

Performance Metrics

Decision API p50180ms

SageMaker Inference45ms

Event Throughput5,000/s

Monthly Cost

Total(Estimating...)

for 1M events/day

Conclusion

Building a production ML system is 10% modeling and 90% engineering. The hard parts are building reliable pipelines, ensuring feature consistency, and creating feedback loops.

Use event-driven architecture for scale

Validate features match between training/inference

Prefer SageMaker built-in algorithms

Infrastructure as Code makes deployments reproducible

Always have a feedback loop

#AWS#MachineLearning#SageMaker#Serverless#MLOps#XGBoost