Data Clean Rooms

What is a Clean Room?

“Clean room” has become a buzzword these past few years, yet I still run into a lot of confusion from people on the subject. At a basic level, clean rooms enable the privacy safe comingling of 1^st and 3^rd party data. They achieve this by masking both data sets behind anonymized IDs, and subsequently anonymized profiles that cannot be re-identified.

Further complicating the topic, there are broadly two “kinds” of clean rooms.

Activation clean rooms, built for buy-side audience segmentation, activation & measurement use cases.
Publisher clean rooms, built for sell-side applications: enabling publishers to expose impression logs to advertisers for measurement.

Some well-known publisher clean rooms would be Facebook’s Advanced Analytics, or Google’s Ads Data Hub. But we’re focusing on activation clean rooms today.

Old Clean Room Architecture

Years ago, while at HBO, I built my first clean room. It’s main function was building more advanced audiences for acquisition campaigns, incorporating 3^rd party data to fill in the behavioral holes of our 1^st party data. It was built in with traditional architecture:

Run our 1^st party data through a 3^rd party identity provider (Liveramp, Experian, Neustar, etc.) to anonymize our data behind a 3^rd party ID that could not be re-identified.
Purchase a 3^rd party household data set, run that through the same 3^rd party identity provider to anonymize that data set behind the same 3^rd party IDs.
Ingest both anonymized data sets into a new environment, in which queries could use the anonymous ID to join between the data sets and garner more comprehensive consumer insights.
To activate audience segments, push the segments 3^rd party IDs back to the 3^rd party identifier, they translate to identifiers and push down to ad platforms for activation.

This was advanced at the time, but over the years several limiting challenges with this type of clean room architecture became apparent:

Converting large data sets to anonymous IDs, then co-mingling in a new environment, is clunky and takes time. At best, we could get to a monthly cadence.
Having to transport, convert and co-mingle 3^rd party data sets limited how many data sets we could realistically apply to the clean room.
Audience activation was clunky (nowhere near real-time) and expensive (3^rd party identity providers typically charge a premium for “onboarding” their IDs to activate).
Privacy – while safeguards were put in place to limit leakage, you’re still actually transferring around PII to convert to anonymized IDs. That transport is a liability.

Where Clean Room Architecture is Going

In the past couple of years, there has been an assortment of different companies with different products trying to facilitate easier to implement clean rooms that integrate more easily into the tech stack. One in particular, Snowflake DCR (Data Clean Room), has allowed us to think about clean room architecture dramatically differently.

DCR allows different companies to connect data shares to a singular DCR instance, and join disparate data sets with either common IDs shared by both data sets – or by incorporating 3^rd party identity providers apps directly within DCR.

All of this ID stitching is done “underneath” DCR, inaccessible by those running the queries on top.

The simplest way to think about it: instead of having to convert whole data sets and then co-mingle in new anonymized ID environments, DCR allows the querying across disparate data sets – without needing to transport, co-mingle or anonymize anything.

Flexibility and speed are the two most obvious benefits, but there is a myriad of other ones I’ve discovered as a I re-architect our clean room. I’ll save those for deeper dives on future days.

But one of the most compelling benefits – adding new 3^rd party data sets to enrich our 1^st party knowledge is so much easier when we don’t have to transport anything, and can simply query across.

The sudden opportunity to sync in thousands, if not hundreds of thousands, of new 3^rd party data attributes – making sense of the plethora of newly available data points has us thinking about AI and Chatbot applications. (Again, I’ll save the deep dive there for a future day).

This concept of querying across disparate data sets, instead of co-mingling them behind anonymized IDs, is changing how we think of clean rooms. I suspect it will be responsible for shifting clean rooms from esoteric concepts that only huge companies can dabble in, to a common component of small – medium sized businesses operating their own tech stacks.

It all comes back to Identity

Leave a Reply

Similar Posts

Leave a Reply Cancel reply

Leave a Reply