Machine learning · Activation layer

Switch-o-meter: shipping UW's first production ML model

Partners had always sold on instinct, knocking on every door. The question behind Switch-o-meter was simple: could we give them a data signal for which households were most likely to switch provider, so they spent their time on the doors most worth knocking on? Answering it meant shipping the first machine-learning model UW had ever put in a live, partner-facing app.

RoleHead of Product, Partner Network

TeamA small enrichment squad (data science, data engineering, product)

Part ofUW Partner Network · Phase 2, Activation

~78%

model accuracy at separating likely converters (AUC ≈ 0.74)

~21k

prospects used to train the first conversion-prediction model

4 → 1

consumer-data providers evaluated, then one selected

1st

production ML model in a live, partner-facing app at UW

The short version

UW partners ran a relationship-driven, largely intuitive sale. The activation problem was that even motivated partners struggled to know where to spend their time. Switch-o-meter was the answer: a machine-learning model that scored each prospect on how likely they were to become a UW customer, surfaced inside the partner app alongside the insights behind the score.

We built it in three phases. A proof of concept tested four consumer-data providers and built a repeatable enrichment pipeline. A prototype trained the first model on roughly 21,000 prospects (around 78% accuracy) and demoed it live on real prospects, with clear separation between the likely and the unlikely. A production design wired enrichment and scoring straight into the partner app.

The headline: for the first time at UW, an ML model went into a live, partner-facing product. The harder-won lesson: data quality, not data volume, was the real lever, and ML needs more patience than a results-hungry organisation expects.

01 · The bet

Give an intuitive sale a data layer

Data enrichment means supplementing what you already know about a prospect with external data, so teams and partners can make smarter decisions. The product bet was that richer, better-timed insight would move the numbers that mattered at the activation stage. These were the targets we set against the wider business goal.

~20%

more prospects qualified

If partners capture more accurate data upfront on prospects

~10%

more prospects reaching a first appointment

If partners have enriched insight before an appointment

~15%

higher conversion, qualified vs non-qualified

If partners have enriched insight throughout the sales journey

02 · How we built it

Proof of concept, prototype, production

We split the first enrichment concept into three phases over roughly five months, de-risking the data and the model before committing to a production build. Pick a phase to see what happened.

03 · Inside the model

What actually predicted a switch

Trained on roughly 21,000 prospects, the model reached around 78% accuracy at telling likely converters from unlikely ones. More interesting than the score was the why: a handful of attribute categories did most of the work, each with a human-readable story behind it.

~78%

overall accuracy

~0.74

AUC

~70–75%

precision & recall

▶ How we kept it interpretable

Accuracy alone is not enough for a model that partners will act on. We built a dashboard to show the impact of each feature on the model, the relationships in the input data, and the demographic representation of the training set, so the model's behaviour could be explained rather than just trusted. That mattered as much for partner adoption as for governance: a partner is far more likely to act on a score they can understand.

04 · Proof it worked

Ten live prospects, one clear line

The moment the model earned trust was a live demo: ten real prospects, scored on conversion likelihood. The separation did the talking.

Seven clear targets, three to deprioritise. The signal partners had never had, turning "knock on every door" into "start with these seven".

05 · Adapting

When the market broke the model

We built and refined Switch-o-meter in a stable energy market. Then the market stopped being stable, and the model met the one thing every production system fears: a world that no longer looked like its training data.

As wholesale energy prices spiked, a wave of UK suppliers that had not hedged their exposure collapsed, and their customers had to go somewhere. A large influx arrived from failed providers such as Bulb, and they looked nothing like UW's typical customer. The model had learned one kind of customer and was suddenly scoring another, so its accuracy slipped. Not because the model was wrong, but because the ground had moved under it.

We adapted quickly, retraining on the new reality until prediction was back to the level partners needed. The harder part was not the retraining, it was the conversation. An exec team new to machine learning saw a model that had been working start to wobble, and a retraining cycle that took time. That gap between "it broke" and "it's fixed" created real pressure, and clearer expectation-setting about how models behave when the world moves would have absorbed much of it.

In the end the episode became one of the better arguments for the whole approach. A model you can monitor, retrain and adapt is a capability, not a one-off artefact. The shock was an unplanned stress test, and the system came through it.

06 · How it works

From gut feel to a ranked pipeline of prospects

The point was to turn a gut-feel sale into a system. A partner captures a prospect; the app enriches it, scores it, and ranks it against every other prospect, floating the highest-propensity people to the top so partners spend their time where it pays off. Under the hood, the production design closed that loop without anyone touching a spreadsheet.

➕

New prospect

captured in the app

→

📡

Event source

captures the prospect

→

🧩

Enrichment service

external data added

→

🔮

Prediction model

on a managed ML platform

→

📱

Partner app

prospects scored & ranked

▶ Limitations and risks we named up front

Manual pipeline: the early deployment was hands-on; automating end to end was the next priority.
Single data source for v1: the first model leaned on one provider; adding more would lift accuracy.
Third-party dependency: external data APIs can go stale or down, so usage and freshness needed monitoring.
Platform maturity: the managed ML platform was new and still maturing, so we tracked it closely.

Monitoring used off-the-shelf tooling to track latency, throughput, drift and availability of the model and its endpoints.

07 · What I learned

The model was the easy part

◆ Data quality beats data volume

One provider returned far more attributes than another, but accuracy tracked completeness, not count. The most complete data file produced the most accurate model. We held the focus on information quality rather than sheer volume.

◆ A model is never finished

Shipping is the start, not the end. The market shock proved it: a live model needs monitoring and a retraining path built in from day one, because the world will eventually stop looking like the training data. Treat the model as a standing system, not a deliverable.

For the first time at UW, a machine-learning model went live in a partner-facing product. The capability mattered more than the metric.

◆ Governance is not a later step

Data privacy ran alongside the build from day one. On a partner-facing model touching consumer data, that is not optional, and starting the conversation early kept the work moving rather than stalling it at the end.

◆ Interpretability drives adoption

Partners act on scores they understand. Building the feature-impact view was as much about adoption as governance: the "why" behind a score is what turns a number into a decision.

08 · So what

The model was the start. The platform was the point.

Switch-o-meter was the activation layer of the wider UW Partner Network rebuild, and the piece I am proudest of: not because the model was elaborate, but because getting a model into a live, partner-facing product in a business that had never done it before is mostly a job of pipelines, providers, privacy and patience. The data science is real, but the harder work sits around it.

The most valuable output was not the prediction. Building that first ML service gave UW its first MLOps infrastructure: the pipelines, deployment and monitoring that every production model needs. Once that existed, the second model was a different conversation, and the question shifted from "can we" to "what next".

First MLOps infrastructurebuilt for Switch-o-meter

→

Customer serviceNLP on call transcripts

Home serviceschurn modelling

And beyondmore AI initiatives

That platform became the foundation for later AI initiatives across the business: an NLP analysis of call-centre transcripts to sharpen customer service, churn modelling inside the home-services team, and others since. The lasting return on one production model was the capability to keep building, and that is what I most want to leave behind in any organisation: the means to make the next one.