How to really do onchain attribution

The Spindl approach to the hardest problem in marketing

Antonio García Martínez

Rohan Meringenti

Kunal Modi

, and

Bin Zhou

Dec 18, 2024

Root system of *Myosotis decumbens*, from the Wageningen University collection.

"For the want of a nail the shoe was lost;
For the want of a shoe the horse was lost;
For the want of a horse the battle was lost;
For the failure of battle the kingdom was lost;
And all for the want of a horseshoe nail."
-James Baldwin, ‘The Horseshoe Nails’ (1912)

What is Attribution?

Attribution is the problem of finding the causal “reason” why a particular event happened in the world. As business logic, the goal is to pick a winner in a competitive attention marketplace, which is also the main input into the marketplace’s business model. In short, who gets paid for driving this user (and their associated revenue), and how much do they get paid? It’s the central question in any marketing system; it’s also the hardest to answer.

To quote some numbers, as of the time of this writing, Spindl just crossed a billion events measured (both offchain and onchain), which is growing around 2-5 million events per day1. Our biggest, longest-running ‘user’ (and more on what that means in our ‘Identity’ piece next week) is on DEX aggregator Hashflow, who has somehow generated 105,954 events in 595 active days2. Vertex, a perps DEX, has alone generated over 100 million tracked events (!).

Why do we bother to track all these upstream events? Isn’t just looking at where the user clicked from good enough? Nope. Not even remotely.

Unlike session tracking via URL parameters in traditional analytics platforms—something often conflated with real attribution, but in practice quite different—marketing attribution is sticky. If a user is activated by an upstream app or publisher, that user will stay attributed to that source, no matter how many times (or from where) they come back to the advertised service.

In Web2, touchpoints and conversion data are well understood: typically users visit a website/mobile app (triggering well-measured events like clicks or app installs), and then make a purchase. There are entire billion-dollar companies—AppsFlyer, Branch, the attribution systems of Facebook and Google—whose only goal in life is ingesting that massive event volume and running attribution logic over the firehose. Dueling methodologies like ‘last touch’ or ‘viewthrough’ define who wins (and loses) in the attribution faceoff, and much of what ads players do is gaming attribution methodology to their advantage.

In Web3, we see a variety of interesting new patterns emerge based on the open nature of the blockchain:

Conversion events often don’t happen on your website or mobile app at all. For instance, selling an NFT or buying a token on a DEX are both often conversion goals for companies, even though those actions don’t happen on an experience controlled by the app. It might not even be humans doing the actions. Without onchain measurement, you’ll miss most of the action.
Web3 has new forms of user acquisition campaigns, like quests (where users are paid to try out a protocol) and NFT airdrops (where users get a free asset or token sent directly to their wallet address). These are novel mechanisms, that would be utterly missed, or misattributed, using standard Web2 ads technology.
Web3 events can be used to drive Web2 events. For instance, we see many games use token airdrops and rewards to encourage mobile app installs: the onchain action is the upstream action that drove something downstream, either offchain or onchain.
dApps can use other dApps as acquisition funnels for their app. For instance, we see many “conquesting” campaigns where dApps will give rebates to users who are active on competitive platforms. The open transaction record of the blockchain makes all sorts of targeting and incentives possible that are mostly impossible in Web2.
Everyone is both advertiser or publisher, both downstream of someone and upstream of someone else. As an anecdotal example, Safe (a Spindl client) uses us to measure their upstream channels, while also doing co-marketed offers to Morpho (another Spindl client), and each side wants to know what’s happening {up/down}-stream. Plus there are Morpho pool runners (like Gauntlet) also running their own marketing funnel that runs through Morpho (and elsewhere). The one-way Web2 funnel is a nested set of layers in Web3, both complicating and easing the measurement challenge. We haven’t encountered a cycle in a marketing funnel yet, but it’ll happen at some point.

How does onchain attribution work in practice?

Attribution is the process of distilling a user’s journey to a weighted set of user touchpoints that caused a ‘conversion’: religious overtones aside, this is simply some terminal event the marketer paid to make happen. A touchpoint is any event—a visit to your website from Twitter, an NFT airdrop you sent a wallet, an onchain quest completion—that leads to a conversion. Note that these aren’t mutually exclusive: events can affect both how a user arrived at a conversion, and a revenue generating event itself. Again, the usual ‘funnel’ is weirdly recursive in Web33.

Let’s look at the following example of a typical user journey. Our goal is to determine which event on the left was responsible for the rightmost conversion event.

Blue boxes are touchpoints, green boxes are conversion events.

The key thing to understand about attribution is that while time runs forward, attribution runs backward: a conversion event triggers rearward logic over already-seen events.

It’s a basic observation, but bears repeating: you don’t ‘know’ if a user is going to convert when they see an ad impression, click on a link, or start browsing your app. All you can do is log events going forward, and then, once a conversion event happens, cobble a likely user funnel together using the web2/web3 identity graph4 that ties the relevant events together.

The first step in that rearward look is a “Proof of Life”: Is this an already active user (a ‘live’ user) that converted, or are they new or potentially resurrected?

Attribution logic with a 45 day churn window (no activity for 45 days or more reclassifies the user as new).

Dapps can configure their ‘Churn Window’ (the number of days a user needs to have done a meaningful activity on the dApp to be considered active). Once a user is ‘active’, they are sticky to the attribution channel. Only when they churn out are they eligible to be (re)attributed to a different source.

In this case, the user didn’t have a meaningful activity in the past 45 days, so they are considered churned. The touchpoints in question (just clicking in from Discord, Twitter, and Google) aren’t serious enough to be ‘proof of life’ active-user events on the protocol.

Next, we check the “Lookback window” - what activity did they do recently that might have triggered their resurrection?

The lookback window retrieves all touchpoints in the timeframe and chooses a winner. By default in Web3, most dApps opt for a ‘first touch’ model; we support ‘last touch’ and the choice here is almost folkloric (it does change results, but in non-linear and non-obvious ways).

In this case, ‘first touch’ picks Twitter as the winner for the blockchain transaction attribution credit.

Building Attribution

At Spindl, we’ve iterated through a few different approaches to running attribution to deal with increasing scale, configuration, and new methodologies.

We started with a “stateless” model, where we simply queried the journey of a converting user on demand. On the scale of tens of thousands of conversion events a day, this was a fairly cheap and easy way for us to calculate attribution, run backfills, and experiment with different models in close to realtime.

-- 45 Day Proof of Life
SELECT attribution_channel FROM events WHERE identity = ? AND is_conversion = true AND time > NOW() - INTERVAL 45 DAYS ORDER BY DESC;

-- First Touch Lookback
SELECT session_channel FROM events WHERE identity = ? AND is_touchpoint = true AND time > NOW() - INTERVAL 7 DAYS ORDER BY ASC;

Here we have an oversimplified set of queries for defining a first-touch stateless attribution model. When an event arrives:

Identity - Determine an identity for the event based on it’s traits
First Query - If this is an existing user, copy over the attribution channel from prior event if within the churn window
Second Query - Look through attribution/lookback window to determine winner

All of this is done in semi-real time, at the cadence of new conversion events, with no need for intermediate caching or pre-categorization. By single-streaming this entire process, attribution results are completely deterministic, easy to debug, and replayable. (You can always replay events and determine what event is the “smoking gun” which led to an attribution winner).

After an initial 30 days of attribution running, the identity graph5 tying events together generally becomes a lot more stable as we have a clearer idea of existing users. As more and more of our clients have gotten to this stage, and client volume of events have spiked (both off and onchain), we’ve shifted to a more stateful methodology.

We can think of stateful as keeping an up-to-date mapping of the user view of the world, by storing a summary of all events for a user in a series of quick lookup tables.

In the below example, we can see how an incoming event updates the Last Seen and Churn Window for a given user.

By breaking down the history of events through a user centric lens, we enable parallelization as each of these user stories can independently run attribution. Of course, the idea of a user’s full event history keyed on an ever shifting, growing identity graph is clearly not very space efficient and can get very expensive, so we must get a little clever on how/what we store this data.

Today, our attribution methodology runs on a hybrid workflow, where we statefully store important touchpoints relevant for each user such as latest Proof of Life (POL) events and Attributable Events valid within certain time windows, Because our data pipelines are run as batch processes, we use stateless queries to organize data within the batch , and intelligently use a combination of stored state and real time querying when necessary. As identities merge or evolve, the stateful storage updates to reflect the latest user interactions, similar to how a cache would operate.

In practice, our pipeline has evolved to our current solution to tackle some interesting challenges relevant to web3:

Data Timeliness - Attribution has certain strong guarantees on data timeliness, as both order and timestamps matter. We cannot run attribution on a conversion event unless all previous “Proof of Life” and touchpoints are already ingested. This makes streaming solutions particularly prone to errors, and requires strong guarantees and run-time checks around data availability.
Late Arriving Data - Simultaneously, late arriving data is the norm. Data providers across blockchains have different SLAs and latencies, and many third-party data providers don’t have real-time reporting capabilities. While we do our best to ensure data comes in strict time ordering, we have a variety of flows that handle late arriving touchpoint data. Since, as noted, we cannot edit any actions we’ve already paid out on, this is a delicate process with many guardrails.
Hotspots - Identities (i.e. what we consider a bundle of relevant events from the same user) have even greater variance, and we have access to even more historical data on the blockchain. The “Proof of Life” data for an MEV bot transacting thousands of times a day is exponentially more expensive to process than a user doing their first transaction on a new wallet.
Merges - Since identity is so dynamic in Web3—onchain wallet identity is fairly stable but offchain behavior is very flakey—merging identity (and attribution) requires novel solutions in data modeling to ensure stability and correctness.

When we started Spindl, the ambition wasn’t to simply reproduce the mechanics of Web2 ad tech onchain, but to build features utterly impossible in Web3. For example, the holy grail of marketers—a ‘multi-touch’ attribution model that correctly credits (and rewards) the multi-causal ways users find products—is entirely possible in onchain advertising. The payment rails for one-to-many payments are there, and with public and transparent data, anyone can verify that your fancy fractional model is sane. Crypto is speedrunning ad tech, from ineffective air drops (and equally ineffective CPM advertising), to purely performance-based advertising with decentralized attribution, leapfrogging 20+ years of Web2 ad tech in a couple of years. Real onchain advertising is finally here.

If you’re a publisher or advertiser looking to work with Spindl, hit us up.

For more news on the cutting edge of Web3 marketing tech:

The biggest attributed channels (across all our clients) in order:
1. Twitter
2. Google
3. Linktree (!)
4. Layer3
5. Collab.land (Discord channels, part of the Spindl network)
6. YouTube

Crypto ‘users’ are often very obviously bots, but if they generate real onchain transactions, nobody seems to care (and this is before the looming AI agent era we’re entering). As one of our clients said in a meeting where we worriedly explained we measured more conversion events than page views (meaning clearly these weren’t humans): “bots are people too”. Money-spending bots, to be clear.

Some events can even be self-attributing, using slightly naive ‘whatever touch’ attribution that simply writes a URL param onchain (e.g. Gains Network writing an optional “Referral Code” field from their app site). This is more or less the default for most ‘onchain attribution’ systems, but its shortcomings become very apparent very quickly once a multi-channel strategy is pursued (e.g. aggregators over-writing the referral tag for a user that a KOL originally referred, cutting off the KOL’s referral ongoing payment).

This ‘identity’ graph is essentially grouping offchain touchpoints like browsers and mobile devices to wallets, and keeping that many-to-many join warm over time. The notion of global identity across identifiers was a huge issue in Web2, and will be just as important in Web3, particularly with wallet infra like Privy and Dynamic spinning up a wallet per app.

In our next chapter of this Spindl Engineering series, we’ll go into the unspoken hero of attribution—identity— where we’ll answer questions such as how can we know that four different user sessions (originating from different apps), and two onchain transactions, all corresponded to a single human user?

A guest post by