Revenue Operations: ML Lead Scoring

Strategy & Implementation · 2025 · 11 min read

Transformed a $3M annual problem into $1M+ savings through expected value optimization—not just prediction, but business decision system

PythonPyCaretH2O AutoMLXGBoostCatBoostLightGBMscikit-learnMLflowSHAPFastAPIStreamlitPlotlySQLitepandas
TL;DR

Revenue operations system that transformed a $3M annual problem into $1M+ savings—reframing lead scoring from classification to expected value optimization with change management built in.

  • Business Case First: $3.65M annual cost validated before writing ML code—ROI modeling drives architecture decisions
  • Expected Value Optimization: Threshold tuning maximizes business value, not accuracy—balancing revenue capture vs. customer churn
  • Change Management: Safeguard parameter maintains 90% of sales to ensure stakeholder adoption
  • Production System: FastAPI + Streamlit deployment ready for Mailchimp/Salesforce integration

The Strategic Context

This project started with a number: $3 million.

That’s the annual cost of aggressive email marketing for a company with 100,000 subscribers. Every sales email blast—five per month—triggers 500 unsubscribes. At a 5% conversion rate and $2,000 customer lifetime value, that’s 125 lost potential customers per month. $250,000 in unrealized revenue. Every month.

The instinct is to fix the messaging. Write better emails. Segment by engagement. But the problem isn’t creative—it’s structural. Without a model of subscriber value, every targeting decision is a guess. And guesses at this scale cost millions.

From Classification to Revenue Operations

Most lead scoring projects frame this as classification: predict who will buy, target them, done. But a probability score isn’t a business decision.

Should you target someone with a 40% purchase probability? 60%? The answer depends on what you lose when you’re wrong. Target a likely buyer who unsubscribes anyway—that’s churn you caused. Skip a likely buyer to “protect” them—that’s revenue you left on the table.

This project reframes lead scoring as expected value optimization: find the targeting threshold that maximizes revenue captured minus customer value destroyed. Not accuracy. Not precision-recall. Business value with explicit assumptions.

The ROI Model (Before the ML)

Before writing model code, the business case needed validation. The cost simulation framework models:

ParameterValueSource
Email list size100,000Current state
Unsubscribe rate0.5% per emailHistorical data
Sales emails/month5Marketing cadence
Customer Lifetime Value$2,000Finance estimate
Conversion rate (if nurtured)5%Historical cohort

These parameters flow into a 12-month cost projection that accounts for list growth (3.5%/month compounding). The output: $3.65M in annual lost potential revenue under current operations.

The project goal isn’t “build a classifier.” It’s reduce that $3.65M by optimizing the revenue-churn tradeoff.

The Change Management Reality

Here’s what most ML tutorials skip: business leaders react badly to recommendations that tank short-term revenue, even when the math is sound.

The threshold optimization can find the mathematically optimal targeting cutoff. But if that cutoff drops monthly sales by 40% in month one—even with long-term gains—it won’t survive the first executive review.

The system includes a safeguard parameter: don’t recommend thresholds that reduce monthly sales below X% of maximum. Default: 90%. This is a change management feature disguised as a model parameter.

At the 90% safeguard level, the optimized threshold delivers:

MetricValue
Expected Value$315,236/month
Expected Savings$88,105/month
Monthly Sales Maintained$227,131 (91% of max)
Customers Saved55/month

That’s $1M+ in annual savings while maintaining sales volume that stakeholders can accept. The safeguard trades mathematical optimality for organizational adoptability.


Pipeline Architecture

The system follows a deliberate progression from data to deployment:

Data Layer → Feature engineering from subscriber behavior, transaction history, and engagement patterns. The custom email_lead_scoring package handles database queries, feature pipelines, and data transformations consistently across exploration and production.

Modeling Layer → AutoML comparison using PyCaret and H2O. Rather than picking an algorithm based on intuition, the pipeline benchmarks 19+ algorithms (XGBoost, CatBoost, LightGBM, Random Forest, Gradient Boosting, etc.) with consistent preprocessing and evaluation. MLflow tracks every experiment—hyperparameters, metrics, artifacts—making model selection auditable.

Optimization Layer → Threshold optimization that translates predictions into targeting decisions. The system simulates expected value across 100 threshold points, accounting for:

  • Revenue from targeting subscribers who purchase
  • Cost of churning subscribers who wouldn’t have purchased anyway
  • Cost of missing sales by not targeting likely buyers
  • Safeguards to prevent aggressive thresholds that tank short-term revenue

Deployment Layer → FastAPI REST API with four endpoints:

  • GET /get_email_subscribers — Retrieve processed subscriber data
  • POST /predict — Score leads using the production model
  • POST /calculate_lead_strategy — Full optimization pipeline with configurable parameters
  • Streamlit dashboard for business users to run scenarios without API calls

Why This Architecture

The separation matters. The modeling layer answers “who is likely to buy?” The optimization layer answers “who should we target?” These are different questions with different stakeholders.

A data scientist cares about AUC and precision-recall curves. A marketing director cares about “if I target the top 30%, how much revenue do I capture and how many subscribers do I lose?” The optimization layer bridges this gap—it translates model outputs into business recommendations with explicit assumptions.


Business Optimization

The threshold optimization methodology deserves detail because it’s where this project differs from typical ML tutorials.

The Expected Value Framework

For any targeting threshold, the system calculates:

ComponentCalculation
Revenue CapturedSales from targeting subscribers who purchase
Churn Cost AvoidedValue preserved by NOT targeting unlikely buyers
Missed RevenueSales lost by not targeting some buyers
Churn Cost IncurredValue lost from targeting and churning non-buyers

Expected Value = Revenue Captured + Churn Cost Avoided - Missed Revenue - Churn Cost Incurred

The optimal threshold maximizes this value, not classification accuracy.

Configurable Assumptions

The optimization accepts business parameters rather than hardcoding them:

  • email_list_size — Scale of the subscriber base
  • unsub_rate_per_sales_email — Churn rate from aggressive targeting
  • avg_customer_value — Lifetime value of a retained subscriber
  • customer_conversion_rate — Baseline conversion probability
  • monthly_sales_reduction_safe_guard — Minimum acceptable revenue (prevents over-optimization)

This lets business users explore scenarios: “What if we’re more conservative about churn? What if customer value is higher than we assumed?”

The Safety Guard

One insight from the course this project is based on: business leaders react badly to recommendations that tank short-term revenue, even if the math says it’s optimal long-term. The monthly_sales_reduction_safe_guard parameter ensures the recommended threshold won’t drop monthly sales below a specified percentage of maximum—a pragmatic concession to organizational reality.


From Model to Product

The deployment layer transforms a Jupyter notebook exercise into something a marketing team can use.

FastAPI Backend

The REST API exposes the full pipeline:

POST /calculate_lead_strategy
Parameters:
  - monthly_sales_reduction_safe_gaurd: float (default 0.9)
  - email_list_size: int (default 100000)
  - unsub_rate_per_sales_email: float (default 0.005)
  - avg_sales_per_month: float (default 250000)
  - customer_conversion_rate: float (default 0.05)
  - avg_customer_value: float (default 2000)

Returns:
  - lead_strategy: DataFrame with subscriber scores and Hot/Cold classification
  - expected_value: Optimal threshold and projected value
  - thresh_optim_table: Full simulation results for visualization

The API accepts subscriber data as JSON, runs the scoring model, performs threshold optimization with the specified parameters, and returns actionable output—including a downloadable CSV of the targeting strategy.

Streamlit Dashboard

For users who don’t want to call APIs, the Streamlit app provides:

  • File upload for subscriber data (CSV)
  • Sliders for business assumptions (monthly sales, safety guard percentage)
  • One-click analysis execution
  • Interactive expected value plot showing the optimization curve
  • Download button for the lead strategy output

The dashboard calls the FastAPI backend, so the logic stays centralized. This separation means the API can serve other integrations (CRM webhooks, batch jobs) without duplicating business logic.


Change Management & Adoption

The hardest part of ML implementation isn’t the model. It’s getting people to use it.

This system addresses what I call the “politics of AI”—the friction between departmental incentives. Marketing wants to email everyone (more touches = more sales). Finance wants to protect customer lifetime value (fewer touches = less churn). The optimization framework doesn’t pick a side; it makes the tradeoff explicit and lets stakeholders negotiate with data instead of opinions.

From Black Box to Glass Box

SHAP (SHapley Additive exPlanations) values transform the model from “trust me” to “here’s why.”

When a sales lead asks why a subscriber scored 0.85, the system provides feature importance: “High engagement with pricing page. Attended two webinars. Opened 8 of last 10 emails.” When a subscriber scores 0.15: “No webinar attendance. Unsubscribed from newsletter segment. Last open 45 days ago.”

This explainability serves three purposes:

  • Debugging: When predictions seem wrong, SHAP reveals which features drove the score
  • Trust: Stakeholders accept recommendations they can understand
  • Learning: Feature importance guides future data collection—if webinar attendance predicts conversion, invest in webinars

The Dashboard as Collaboration Tool

The Streamlit interface isn’t just a delivery mechanism—it’s designed for the “Strategic Persona” (Marketing Director, RevOps Lead) to stay in the loop.

The “what-if” scenario capability matters here. When a marketing director adjusts the slider from 90% to 80% sales safeguard and sees expected savings jump from $88K to $120K/month, they’re not just receiving a recommendation—they’re participating in the decision. The tradeoff becomes visceral.

This is human-in-the-loop for strategy, not just for edge cases.

The Training Bridge

Adoption doesn’t happen at deployment. It happens through structured capability building.

The 30-60-90 Training Plan provides the progression:

PhaseFocusThis System
Days 1-30AwarenessWhat the system does, how scores are calculated, basic dashboard navigation
Days 31-60ApplicationRunning scenarios, interpreting SHAP explanations, adjusting safeguard parameters
Days 61-90MasteryConnecting outputs to campaign strategy, identifying when to override, training others

The system is designed to support this progression. Early users get clear scores and recommendations. Intermediate users get scenario tools. Advanced users get the explainability layer to challenge and refine.


Technical Foundation

Model Selection

The PyCaret AutoML comparison evaluated 19+ algorithms:

  • Gradient Boosting: XGBoost, CatBoost, LightGBM
  • Tree-based: Random Forest, Extra Trees, Decision Tree
  • Linear: Logistic Regression, Ridge Classifier
  • Other: SVM, KNN, Naive Bayes, Quadratic Discriminant Analysis

The production model is a blended ensemble of top performers, tracked and versioned in MLflow. The selection is defensible: “We tested 19 algorithms and this ensemble performed best on holdout data.”

Package Architecture

The email_lead_scoring/ package organizes reusable functions:

  • database.py — Data loading and SQL queries
  • cost_calculations.py — ROI modeling and cost simulation
  • modeling.py — Model scoring functions
  • lead_strategy.py — Threshold optimization logic
  • exploratory.py — EDA utilities

This structure means the same code runs in notebooks, API endpoints, and scheduled jobs without duplication. When the model improves, one update propagates everywhere.


  • Source: Based on DS4B 201-P from Business Science University, extended with production deployment and adoption scaffolding