Why Predictive Signal Infrastructure Goes Far Beyond the Model
Eran Friendinger

Why Predictive Signal Infrastructure Goes Far Beyond the Model

Eran Friendinger

As predictive LTV (pLTV) optimization becomes the standard for paid media campaigns across Google, Meta, and Tiktok, data scientists are increasingly being asked to build LTV models that predict user value. The first-party data is there, the technology is mature, and the modeling work itself — feature engineering, training, validation — is well within reach for in-house teams.

Then you push the model to an ad platform, and it starts behaving in ways your offline evaluation never predicted. The platform learns from your signals differently than you expected, cohort quality drifts, and budget follows it in the wrong direction quietly. By the time reporting catches up weeks later, you’ve already spent a good portion of your budget on the wrong users.

Nothing about the data pipelines, validation curves, or offline metrics can prepare you for what production actually demands — a system that learns alongside the auction in real time, not behind it.

For most teams building this in-house, that means 3-5 senior hires, twelve months to first results, and a permanent operational commitment. This post breaks down what that actually involves, layer by layer, and what to weigh before you commit.

Building the Model

As counterintuitive as it sounds, the LTV model itself is only about 20% of the total effort. The work that actually determines whether your paid media performs happens after the model is built, and most internal models aren't designed for any of it.

Why Existing LTV Models Don’t Work for Paid Media

Standard LTV models built for forecasting or segmentation are optimized to minimize average error across your user base. That's the right objective for planning, where directional accuracy balances out. In ad auctions, those errors don't cancel out, which will quietly push spend in the wrong direction before it shows up in reporting.

A model built for paid media activation needs to behave differently:

  • Optimize for separation, not average accuracy.The job is to reliably distinguish high-value users from low-value ones.
  • Penalize errors asymmetrically, with focus on local error rates rather than global accuracy. The priority is reducing errors in auctions where bidding traps exist. Global can mask consistent misbidding in the cohorts that matter most.
  • Predict early enough to influence auctions. Platforms have roughly a 7-day optimization window, but for most businesses those first 7 days tell you almost nothing about future customer value. Predictions need to fire within hours or days of acquisition.
  • Retrain after activation. Continuous retraining keeps predictions calibrated to who you're actually acquiring, but modeling alone doesn't capture business changes. Funnel shifts, pricing updates, and evolving preferences like target segments or OKRs need to be baked into your signal policy directly.

Our Approach at Voyantis

Every model is engineered for activation, not just analysis:

  • Growth Context. We model your business — conversion events, value definitions, and constraints — so predictions are calibrated to your economics.
  • Data Breadth. Models ingest the full breadth of fully anonymized, first-party data — preferences, transaction patterns, engagement frequency, session depth — with positive and negative signals weighed together. We can incorporate the outputs of your internal model as an additional input signal, building on the foundation you’ve already established.
  • Automatic Retraining. Because bidding changes who shows up, models retrain continuously as your business evolves, so predictions stay calibrated to who you're actually acquiring.

Engineering the Signal

Once you have predictions, you need to get them into ad platforms in a way those platforms can actually learn from. That discipline — signal engineering — is the layer that sits between your predictions and the ad platforms.

Predictions aren't conversions

Platforms were built to learn from deterministic events — actions that actually happened, like first purchases, trial starts, or credit card approvals. Predictions are probabilistic, evolve constantly, and represent behavior that hasn't occurred yet. Translating them into something a platform can sustainably optimize against requires decisions that are different for every network, every ad product, and every campaign objective. You're not sending the same signal to Google and Meta. 

Platform behavior changes without notice

The moment predictions start influencing auctions, the platform changes who it bids and wins for you. A bias you never knew existed — invisible in backtesting and harmless in isolation — can shift large portions of budget into the wrong segments within two weeks. We've seen modest overvaluation of low-value segments shift large portions of budget into those segments. 

For example, if your model inadvertently slightly over-correlates predicted value with device type or time of day, the network will exploit that pattern to serve you cheaper inventory that matches those features, not higher-value users — and this will not show up when you evaluate model accuracy. A small, local, problematic error profile, can turn into a hurricane.

Compounding that, each network's learning logic — value caps, floor thresholds, attribution windows, signal batching behavior, how conversion values are weighted in auction decisions — shifts regularly, often without documentation. Teams that have to react to unannounced platform updates spend weeks diagnosing performance degradation that looks like normal variance before finding the cause.

Our Approach at Voyantis

Signal engineering is fully autonomous, informed by thousands of experiments across every major ad product:

  • Channel-specific encoding. We translate business outcomes into platform-native value formats, with each platform spoken to in its own language.
  • Debiasing. We strip predictions of hidden correlations before sending, or the platform will find and exploit them. Models and signals are updated as your user mix changes, so campaigns stay calibrated to your current customer reality.
  • Observability. From ingestion to platform response, issues surface before they reach your campaigns, not after they've already cost you.
  • Signal Refinement. As users engage with your product and your confidence in their value increases during that initial conversion window, we send updated signals back to the platform. This gives the algorithm a more accurate picture of user quality over time, not just at the moment of acquisition.

For Shippo, getting predictions into Google meant filtering out merchants unlikely to scale, capping outliers so high-volume whales wouldn't distort the algorithm, and weighting the value gap between subscription tiers so Google would pursue power shippers more aggressively.

Validating at Scale

There's no sandbox for this. Every experiment runs on live spend, and each testing cycle means real budget allocated to an unproven configuration. Without prior experience with your platform and campaign type, teams typically spend six to twelve months of experimentation budget learning what a team with cross-campaign pattern recognition could compress into one or two cycles. 

Bear in mind that every model update or signal change also triggers a new round of live-budget testing.

In testing, plan for three iterations with each cycle taking one to two months: neutral or negative uplift in the first pass because configuration is rarely right, parity and early understanding of what to adjust in the second, and meaningful lift in the third. Naive audience splits contaminate easily — users bleed between segments, remarketing pools overlap, and results blur — so geo-split designs or structured holdout regions are typically more reliable. 

Our Approach at Voyantis

Every implementation starts with a structured path from validation to scale:

  • Shadow mode first. Signals are sent to the platform before they're used for bidding — so you're proving correlation before you're spending against it.
  • Templated test design. Bidding structures, spend parity rules, measurement alignment, and sensitivity thresholds are drawn from frameworks refined across hundreds of prior campaigns.
  • Clear pilot-to-scale criteria. Each test has predefined thresholds for when to iterate, when to scale, and when to adjust.

Upside structured their rollout across 10 of their 210 geographic markets, with predefined criteria for when to expand and a clear path to full budget before they committed to it.

Running a System That Never Stops

Keeping the system running is a permanent commitment, and most of the work is invisible until something breaks. For example:

  • Match rates drop because the platform changed its identity graph.
  • A data pipeline delays predictions past the attribution window.
  • A model retraining resets the network's learned bidding patterns.

Without dedicated infrastructure, someone on your team owns this permanently and must understand the full stack well enough to diagnose it quickly. That means pulling your best machine learning engineers away from their product work.

Our Approach at Voyantis

The system gets smarter over time, continuously learning from incoming data on bidding performance, shifts in your user mix, and conversion attribution:

  • Continuous monitoring. Every layer — from data pipelines, to model predictions, to signal delivery, to campaign performance — is tracked automatically. Deviations surface before they reach the ad platform.
  • Adaptive systems. When activity looks unusual, automated responses kick in immediately. If primary signals degrade, fallback models activate to ensure continuity without manual intervention.
  • Calibration that preserves learning. When models retrain, automated pipelines validate new models against existing ones before deployment — so the network's learning isn't reset and campaigns don't skip a beat.

What the Full Build Actually Requires

We built this system before we productized it — years of turning expertise into infrastructure so your team doesn't have to. The pattern we've seen across hundreds of implementations is consistent: eight to twelve months to see business impact on a single channel if you build in-house, and six to ten months for each additional channel.

How the Two Paths Compare

Your team's advantage is your product, your customers, and your data. The activation layer is infrastructure that compounds with experience across hundreds of implementations. It doesn't get better by being rebuilt from scratch at each company.

If you're working through this decision, talk to our team. We've helped companies like Miro, Semrush, Lennar, MoneyLion, Opendoor, and inDrive turn this from a multi-year build into a competitive advantage.

Back