Revenue Operations: ML Lead Scoring
Transformed a $3M annual problem into $1M+ savings through expected value optimization—not just prediction, but business decision system
Revenue operations system that transformed a $3M annual problem into $1M+ savings—reframing lead scoring from classification to expected value optimization with change management built in.
- Business Case First: $3.65M annual cost validated before writing ML code—ROI modeling drives architecture decisions
- Expected Value Optimization: Threshold tuning maximizes business value, not accuracy—balancing revenue capture vs. customer churn
- Change Management: Safeguard parameter maintains 90% of sales to ensure stakeholder adoption
- Production System: FastAPI + Streamlit deployment ready for Mailchimp/Salesforce integration
The Strategic Context
This project started with a number: $3 million.
That’s the annual cost of aggressive email marketing for a company with 100,000 subscribers. Every sales email blast—five per month—triggers 500 unsubscribes. At a 5% conversion rate and $2,000 customer lifetime value, that’s 125 lost potential customers per month. $250,000 in unrealized revenue. Every month.
The instinct is to fix the messaging. Write better emails. Segment by engagement. But the problem isn’t creative—it’s structural. Without a model of subscriber value, every targeting decision is a guess. And guesses at this scale cost millions.
From Classification to Revenue Operations
Most lead scoring projects frame this as classification: predict who will buy, target them, done. But a probability score isn’t a business decision.
Should you target someone with a 40% purchase probability? 60%? The answer depends on what you lose when you’re wrong. Target a likely buyer who unsubscribes anyway—that’s churn you caused. Skip a likely buyer to “protect” them—that’s revenue you left on the table.
This project reframes lead scoring as expected value optimization: find the targeting threshold that maximizes revenue captured minus customer value destroyed. Not accuracy. Not precision-recall. Business value with explicit assumptions.
The ROI Model (Before the ML)
Before writing model code, the business case needed validation. The cost simulation framework models:
| Parameter | Value | Source |
|---|---|---|
| Email list size | 100,000 | Current state |
| Unsubscribe rate | 0.5% per email | Historical data |
| Sales emails/month | 5 | Marketing cadence |
| Customer Lifetime Value | $2,000 | Finance estimate |
| Conversion rate (if nurtured) | 5% | Historical cohort |
These parameters flow into a 12-month cost projection that accounts for list growth (3.5%/month compounding). The output: $3.65M in annual lost potential revenue under current operations.
The project goal isn’t “build a classifier.” It’s reduce that $3.65M by optimizing the revenue-churn tradeoff.
The Change Management Reality
Here’s what most ML tutorials skip: business leaders react badly to recommendations that tank short-term revenue, even when the math is sound.
The threshold optimization can find the mathematically optimal targeting cutoff. But if that cutoff drops monthly sales by 40% in month one—even with long-term gains—it won’t survive the first executive review.
The system includes a safeguard parameter: don’t recommend thresholds that reduce monthly sales below X% of maximum. Default: 90%. This is a change management feature disguised as a model parameter.
At the 90% safeguard level, the optimized threshold delivers:
| Metric | Value |
|---|---|
| Expected Value | $315,236/month |
| Expected Savings | $88,105/month |
| Monthly Sales Maintained | $227,131 (91% of max) |
| Customers Saved | 55/month |
That’s $1M+ in annual savings while maintaining sales volume that stakeholders can accept. The safeguard trades mathematical optimality for organizational adoptability.
Pipeline Architecture
The system follows a deliberate progression from data to deployment:
Data Layer → Feature engineering from subscriber behavior, transaction history, and engagement patterns. The custom email_lead_scoring package handles database queries, feature pipelines, and data transformations consistently across exploration and production.
Modeling Layer → AutoML comparison using PyCaret and H2O. Rather than picking an algorithm based on intuition, the pipeline benchmarks 19+ algorithms (XGBoost, CatBoost, LightGBM, Random Forest, Gradient Boosting, etc.) with consistent preprocessing and evaluation. MLflow tracks every experiment—hyperparameters, metrics, artifacts—making model selection auditable.
Optimization Layer → Threshold optimization that translates predictions into targeting decisions. The system simulates expected value across 100 threshold points, accounting for:
- Revenue from targeting subscribers who purchase
- Cost of churning subscribers who wouldn’t have purchased anyway
- Cost of missing sales by not targeting likely buyers
- Safeguards to prevent aggressive thresholds that tank short-term revenue
Deployment Layer → FastAPI REST API with four endpoints:
GET /get_email_subscribers— Retrieve processed subscriber dataPOST /predict— Score leads using the production modelPOST /calculate_lead_strategy— Full optimization pipeline with configurable parameters- Streamlit dashboard for business users to run scenarios without API calls
Why This Architecture
The separation matters. The modeling layer answers “who is likely to buy?” The optimization layer answers “who should we target?” These are different questions with different stakeholders.
A data scientist cares about AUC and precision-recall curves. A marketing director cares about “if I target the top 30%, how much revenue do I capture and how many subscribers do I lose?” The optimization layer bridges this gap—it translates model outputs into business recommendations with explicit assumptions.
Business Optimization
The threshold optimization methodology deserves detail because it’s where this project differs from typical ML tutorials.
The Expected Value Framework
For any targeting threshold, the system calculates:
| Component | Calculation |
|---|---|
| Revenue Captured | Sales from targeting subscribers who purchase |
| Churn Cost Avoided | Value preserved by NOT targeting unlikely buyers |
| Missed Revenue | Sales lost by not targeting some buyers |
| Churn Cost Incurred | Value lost from targeting and churning non-buyers |
Expected Value = Revenue Captured + Churn Cost Avoided - Missed Revenue - Churn Cost Incurred
The optimal threshold maximizes this value, not classification accuracy.
Configurable Assumptions
The optimization accepts business parameters rather than hardcoding them:
email_list_size— Scale of the subscriber baseunsub_rate_per_sales_email— Churn rate from aggressive targetingavg_customer_value— Lifetime value of a retained subscribercustomer_conversion_rate— Baseline conversion probabilitymonthly_sales_reduction_safe_guard— Minimum acceptable revenue (prevents over-optimization)
This lets business users explore scenarios: “What if we’re more conservative about churn? What if customer value is higher than we assumed?”
The Safety Guard
One insight from the course this project is based on: business leaders react badly to recommendations that tank short-term revenue, even if the math says it’s optimal long-term. The monthly_sales_reduction_safe_guard parameter ensures the recommended threshold won’t drop monthly sales below a specified percentage of maximum—a pragmatic concession to organizational reality.
From Model to Product
The deployment layer transforms a Jupyter notebook exercise into something a marketing team can use.
FastAPI Backend
The REST API exposes the full pipeline:
POST /calculate_lead_strategy
Parameters:
- monthly_sales_reduction_safe_gaurd: float (default 0.9)
- email_list_size: int (default 100000)
- unsub_rate_per_sales_email: float (default 0.005)
- avg_sales_per_month: float (default 250000)
- customer_conversion_rate: float (default 0.05)
- avg_customer_value: float (default 2000)
Returns:
- lead_strategy: DataFrame with subscriber scores and Hot/Cold classification
- expected_value: Optimal threshold and projected value
- thresh_optim_table: Full simulation results for visualization
The API accepts subscriber data as JSON, runs the scoring model, performs threshold optimization with the specified parameters, and returns actionable output—including a downloadable CSV of the targeting strategy.
Streamlit Dashboard
For users who don’t want to call APIs, the Streamlit app provides:
- File upload for subscriber data (CSV)
- Sliders for business assumptions (monthly sales, safety guard percentage)
- One-click analysis execution
- Interactive expected value plot showing the optimization curve
- Download button for the lead strategy output
The dashboard calls the FastAPI backend, so the logic stays centralized. This separation means the API can serve other integrations (CRM webhooks, batch jobs) without duplicating business logic.
Change Management & Adoption
The hardest part of ML implementation isn’t the model. It’s getting people to use it.
This system addresses what I call the “politics of AI”—the friction between departmental incentives. Marketing wants to email everyone (more touches = more sales). Finance wants to protect customer lifetime value (fewer touches = less churn). The optimization framework doesn’t pick a side; it makes the tradeoff explicit and lets stakeholders negotiate with data instead of opinions.
From Black Box to Glass Box
SHAP (SHapley Additive exPlanations) values transform the model from “trust me” to “here’s why.”
When a sales lead asks why a subscriber scored 0.85, the system provides feature importance: “High engagement with pricing page. Attended two webinars. Opened 8 of last 10 emails.” When a subscriber scores 0.15: “No webinar attendance. Unsubscribed from newsletter segment. Last open 45 days ago.”
This explainability serves three purposes:
- Debugging: When predictions seem wrong, SHAP reveals which features drove the score
- Trust: Stakeholders accept recommendations they can understand
- Learning: Feature importance guides future data collection—if webinar attendance predicts conversion, invest in webinars
The Dashboard as Collaboration Tool
The Streamlit interface isn’t just a delivery mechanism—it’s designed for the “Strategic Persona” (Marketing Director, RevOps Lead) to stay in the loop.
The “what-if” scenario capability matters here. When a marketing director adjusts the slider from 90% to 80% sales safeguard and sees expected savings jump from $88K to $120K/month, they’re not just receiving a recommendation—they’re participating in the decision. The tradeoff becomes visceral.
This is human-in-the-loop for strategy, not just for edge cases.
The Training Bridge
Adoption doesn’t happen at deployment. It happens through structured capability building.
The 30-60-90 Training Plan provides the progression:
| Phase | Focus | This System |
|---|---|---|
| Days 1-30 | Awareness | What the system does, how scores are calculated, basic dashboard navigation |
| Days 31-60 | Application | Running scenarios, interpreting SHAP explanations, adjusting safeguard parameters |
| Days 61-90 | Mastery | Connecting outputs to campaign strategy, identifying when to override, training others |
The system is designed to support this progression. Early users get clear scores and recommendations. Intermediate users get scenario tools. Advanced users get the explainability layer to challenge and refine.
Technical Foundation
Model Selection
The PyCaret AutoML comparison evaluated 19+ algorithms:
- Gradient Boosting: XGBoost, CatBoost, LightGBM
- Tree-based: Random Forest, Extra Trees, Decision Tree
- Linear: Logistic Regression, Ridge Classifier
- Other: SVM, KNN, Naive Bayes, Quadratic Discriminant Analysis
The production model is a blended ensemble of top performers, tracked and versioned in MLflow. The selection is defensible: “We tested 19 algorithms and this ensemble performed best on holdout data.”
Package Architecture
The email_lead_scoring/ package organizes reusable functions:
database.py— Data loading and SQL queriescost_calculations.py— ROI modeling and cost simulationmodeling.py— Model scoring functionslead_strategy.py— Threshold optimization logicexploratory.py— EDA utilities
This structure means the same code runs in notebooks, API endpoints, and scheduled jobs without duplication. When the model improves, one update propagates everywhere.
Related Content
- Source: Based on DS4B 201-P from Business Science University, extended with production deployment and adoption scaffolding