Partners had always sold on instinct, knocking on every door. The question behind Switch-o-meter was simple: could we give them a data signal for which households were most likely to switch provider, so they spent their time on the doors most worth knocking on? Answering it meant shipping the first machine-learning model UW had ever put in a live, partner-facing app.
UW partners ran a relationship-driven, largely intuitive sale. The activation problem was that even motivated partners struggled to know where to spend their time. Switch-o-meter was the answer: a machine-learning model that scored each prospect on how likely they were to become a UW customer, surfaced inside the partner app alongside the insights behind the score.
We built it in three phases. A proof of concept tested four consumer-data providers and built a repeatable enrichment pipeline. A prototype trained the first model on roughly 21,000 prospects (around 78% accuracy) and demoed it live on real prospects, with clear separation between the likely and the unlikely. A production design wired enrichment and scoring straight into the partner app.
The headline: for the first time at UW, an ML model went into a live, partner-facing product. The harder-won lesson: data quality, not data volume, was the real lever, and ML needs more patience than a results-hungry organisation expects.
Data enrichment means supplementing what you already know about a prospect with external data, so teams and partners can make smarter decisions. The product bet was that richer, better-timed insight would move the numbers that mattered at the activation stage. These were the targets we set against the wider business goal.
more prospects qualified
more prospects reaching a first appointment
higher conversion, qualified vs non-qualified
We split the first enrichment concept into three phases over roughly five months, de-risking the data and the model before committing to a production build. Pick a phase to see what happened.
Trained on roughly 21,000 prospects, the model reached around 78% accuracy at telling likely converters from unlikely ones. More interesting than the score was the why: a handful of attribute categories did most of the work, each with a human-readable story behind it.
Accuracy alone is not enough for a model that partners will act on. We built a dashboard to show the impact of each feature on the model, the relationships in the input data, and the demographic representation of the training set, so the model's behaviour could be explained rather than just trusted. That mattered as much for partner adoption as for governance: a partner is far more likely to act on a score they can understand.
The moment the model earned trust was a live demo: ten real prospects, scored on conversion likelihood. The separation did the talking.
Seven clear targets, three to deprioritise. The signal partners had never had, turning "knock on every door" into "start with these seven".
We built and refined Switch-o-meter in a stable energy market. Then the market stopped being stable, and the model met the one thing every production system fears: a world that no longer looked like its training data.
As wholesale energy prices spiked, a wave of UK suppliers that had not hedged their exposure collapsed, and their customers had to go somewhere. A large influx arrived from failed providers such as Bulb, and they looked nothing like UW's typical customer. The model had learned one kind of customer and was suddenly scoring another, so its accuracy slipped. Not because the model was wrong, but because the ground had moved under it.
We adapted quickly, retraining on the new reality until prediction was back to the level partners needed. The harder part was not the retraining, it was the conversation. An exec team new to machine learning saw a model that had been working start to wobble, and a retraining cycle that took time. That gap between "it broke" and "it's fixed" created real pressure, and clearer expectation-setting about how models behave when the world moves would have absorbed much of it.
In the end the episode became one of the better arguments for the whole approach. A model you can monitor, retrain and adapt is a capability, not a one-off artefact. The shock was an unplanned stress test, and the system came through it.
The point was to turn a gut-feel sale into a system. A partner captures a prospect; the app enriches it, scores it, and ranks it against every other prospect, floating the highest-propensity people to the top so partners spend their time where it pays off. Under the hood, the production design closed that loop without anyone touching a spreadsheet.
Monitoring used off-the-shelf tooling to track latency, throughput, drift and availability of the model and its endpoints.
One provider returned far more attributes than another, but accuracy tracked completeness, not count. The most complete data file produced the most accurate model. We held the focus on information quality rather than sheer volume.
Shipping is the start, not the end. The market shock proved it: a live model needs monitoring and a retraining path built in from day one, because the world will eventually stop looking like the training data. Treat the model as a standing system, not a deliverable.
Data privacy ran alongside the build from day one. On a partner-facing model touching consumer data, that is not optional, and starting the conversation early kept the work moving rather than stalling it at the end.
Partners act on scores they understand. Building the feature-impact view was as much about adoption as governance: the "why" behind a score is what turns a number into a decision.
Switch-o-meter was the activation layer of the wider UW Partner Network rebuild, and the piece I am proudest of: not because the model was elaborate, but because getting a model into a live, partner-facing product in a business that had never done it before is mostly a job of pipelines, providers, privacy and patience. The data science is real, but the harder work sits around it.
The most valuable output was not the prediction. Building that first ML service gave UW its first MLOps infrastructure: the pipelines, deployment and monitoring that every production model needs. Once that existed, the second model was a different conversation, and the question shifted from "can we" to "what next".
That platform became the foundation for later AI initiatives across the business: an NLP analysis of call-centre transcripts to sharpen customer service, churn modelling inside the home-services team, and others since. The lasting return on one production model was the capability to keep building, and that is what I most want to leave behind in any organisation: the means to make the next one.