Article

Six Questions to Ask Before Building Predictive Signal Infrastructure In-House

April 7, 2026

Eran Friendinger
Co-Founder & Chief Product Officer
LTV Modeling
Data Science
Value-based Bidding

TL;DR: The hardest part of building predictive signal infrastructure is getting it into production. Auction design, feedback loops, platform encoding, experiment integrity, model drift, and ownership all look different once the auction is responding to your signals. The six questions in this piece are the ones to work through before you get there.

Anyone who has worked in paid media knows the platforms don't sit still. Google, Meta, and TikTok are continuously shifting their own learning logic, signal processing, and attribution behavior, often without formal documentation or advance notice. The scope of what needs to be monitored, maintained, and adapted grows after launch rather than stabilizing. 

Building predictive signal infrastructure means building something that has to survive and adapt to all of that. What's hardest to anticipate is what happens once the system is live and the auction starts responding to your signals. 

In practice, these are the questions that come up without fail after launch, the ones teams wish they'd asked before the system was running and the cost of learning was measured in real budget. The answers will look different for every team, but if you're planning to build, you should be able to speak to each layer.

1. Is your model built for auctions or for analysis?

You may already have an existing LTV model, one built for forecasting, segmentation, or budget planning. These models are optimized for average prediction error across your user base, which is the right objective for planning, where directional accuracy is enough and errors cancel out.

In a live auction, those same errors multiply. Ad platforms need a signal that separates high-value users from low-value ones early enough to shape learning. Most operate on a 7-day attribution window, and extracting a reliable signal from that early behavioral data requires a model explicitly designed for it, not adapted from a different use case. 

A model with better aggregate accuracy can actually drive worse auction performance than a less accurate one, because what the platform responds to is patterns. If a model's residual errors cluster, the platform treats that structure as signal regardless of what the MAPE says. Random errors are invisible to the auction, while systematic ones become targeting logic.

Overvaluing a low-quality user and undervaluing a high-quality one carry different costs. A loss function that treats them as equivalent will misbid at scale, allocating significant budget toward the wrong users before it shows up in performance. A model that predicts $48 for a user worth $200 and $52 for a user worth $5 has excellent aggregate accuracy. The platform sees two nearly identical users and defaults to buying the cheapest impression that converts. 

2. Can you detect and break feedback loops before the auction exploits them?

Google, Meta, and TikTok are built to extract and act on every pattern in the signals you send. If your predictions correlate with a proxy variable, like device type, time of day, or geographic density, the platform will find it and optimize toward it, pushing spend toward cheap inventory that matches those features rather than users who are actually valuable.

What makes this hard is that you're not optimizing against a static dataset. The moment you send a value signal, you change what the network sends back. If your model overvalues a segment — say, Android users acquired after 10pm — the network learns those users carry high conversion value at low CPM, floods your campaigns with them, and the distribution of incoming users shifts away from what your model was trained on. You can't immediately correct for it, because LTV labels take months to mature, and you're validating against cohorts acquired under a different signal regime. 

The gap between who your model expects and who the platform is actually sending widens while you wait. A correlation invisible in backtesting can shift 20-30% of your budget within two weeks. By the time it shows up in your CAC and cohort quality, the spend has already moved.

3. Do you have a dedicated signal strategy for each platform?

Ad platforms were built for deterministic signals — events that actually happened, with clear timestamps. Predictions are probabilistic, represent behavior that hasn't occurred yet, and evolve as users engage. Encoding matters as much as accuracy. Many in-house implementations underinvest in the translation layer, signal engineering, because the constraints are platform-specific and only fully legible in practice. 

For example:

  • Google accepts cumulative value updates, so if you predict $10 at hour one and $15 at hour six, you send the $5 delta. Meta requires event-level values with its own deduplication logic. 
  • Meta and TikTok need signals delivered within roughly an hour of the user event. Google is more forgiving at around three days, but even so, signals arriving after 25 hours lose an estimated 40-50% of their bidding effectiveness.

A prediction that is technically correct but encoded in the wrong schema teaches the network the wrong lesson, and that's harder to recover from than sending nothing at all. Even a correctly encoded signal can fail silently if the platform can't match it back to a user in its system.

4. Do you have a tested experiment design framework?

Every experiment runs on live spend. Naive audience splits contaminate easily, with users overlapping remarketing pools, cohorts crossing, and results blurring in ways that are hard to untangle after the fact. Geo-split designs or structured holdout regions are more reliable, but they require upfront planning and predefined success criteria rather than post-hoc interpretation of results.

Starting in shadow mode — sending predicted values as a reporting-only signal before using them for bidding — lets you validate that the pipeline works and that predictions correlate with actual outcomes before any budget is at risk. It also isolates failure modes: if shadow mode looks healthy but the live test shows no lift, the problem is bid strategy configuration, not the model. Without that staging, all failures look like the model didn’t work.

Launch also opens a validation gap. Once the system goes live, there's a 30-60 day window before real LTV data exists to confirm whether predictions are working. Having early indicators in place — early conversion rates, predicted vs. observed distributions, match rate stability — can give you confidence the system is working before the real numbers arrive.

5. Is your model designed to adapt without disrupting platform learning?

Bidding changes who you acquire. A model trained on last quarter's users will gradually fall out of step with the users arriving now, and predictions calibrated to the wrong population misdirect spend quietly in ways that look like normal variance until the gap is already significant.

There are two ways to close that gap: recalibration and retraining. Retraining — data collection, validation, deployment — takes weeks, and a significant enough model change can look like a new campaign to the platform, resetting the bidding intelligence it's built up over months. Recalibration — adjusting thresholds, segment multipliers, and value caps — can happen in near real time without disrupting what the platform has already learned.

LTV labels mature over months, so the actuals you're validating against today reflect users acquired under a different signal regime. Retraining on those labels without accounting for how your signal shaped the traffic mix risks baking the bias deeper into the next model.

6. Do you have someone dedicated to owning this system end-to-end?

Each network's learning logic — value caps, floor thresholds, attribution windows, signal batching behavior — shifts regularly. Finding the real cause of a performance change requires someone paying close enough attention to notice when platform behavior has changed, not just when campaign metrics have moved.

This system sits at the intersection of data science, engineering, and growth — teams that rarely share a roadmap. A product change that invalidates input features, an engineering migration that adds pipeline latency, or even a campaign restructure that breaks segment mapping can degrade signal quality without anyone realizing it was their responsibility to flag.

The person responsible needs to maintain enough context across platforms, teams, and model updates to catch problems within days, not weeks. As AI accelerates platform development, that's a responsibility that grows over time, not one that gets easier to manage.

Decide deliberately, or the default decides for you

Strong data science teams build impressive things, and the modeling work here is within reach. Where these systems tend to come unstuck is in production. Either the auction finds something in your signals to exploit, or it shifts in ways your system isn't built to follow.

Neither failure mode is obvious in advance, but both have known patterns, and both require sustained investment to manage. The engineers best positioned to run it are often the ones best positioned to build your product. Whether that's the right use of your best people is worth asking deliberately, because the default is also a decision.

→ If you're working through this decision and want to go deeper on what the build actually involves, layer by layer, we've written a detailed breakdown of the full stack.
Subscribe